PYTHON PROGRAM RELATED TO INFORMATION RETRIEVAL AND WEB SEARCH

 

Problem 1 [30 points]. Write a (Python) program that preprocesses a 

collection of documents using the recommendations given in the

Text Operations lecture. The input to the program will be a directory

containing a list of text files. Use the files from assignment #3 as

test data as well as 10 documents (manually) collected from news.yahoo.com .

The yahoo documents must be converted to text before using them.



Remove the following during the preprocessing:

- digits

- punctuation

- stop words (use the generic list available at ...ir-websearch/papers/english.stopwords.txt)

- urls and other html-like strings

- uppercases

- morphological variations
Above mentioned assignment 3# file is also attached and by running this code in anaconda spider you can see the output

Needs help with similar assignment?

We are available 24x7 to deliver the best services and assignment ready within 3-4 hours? Order a custom-written, plagiarism-free paper

Get Answer Over WhatsApp Order Paper Now