PYTHON PROGRAM RELATED TO INFORMATION RETRIEVAL AND WEB SEARCH
Problem 1 [30 points]. Write a (Python) program that preprocesses a
collection of documents using the recommendations given in the
Text Operations lecture. The input to the program will be a directory
containing a list of text files. Use the files from assignment #3 as
test data as well as 10 documents (manually) collected from news.yahoo.com .
The yahoo documents must be converted to text before using them.
Remove the following during the preprocessing:
- digits
- punctuation
- stop words (use the generic list available at ...ir-websearch/papers/english.stopwords.txt)
- urls and other html-like strings
- uppercases
- morphological variations
Above mentioned assignment 3# file is also attached and by running this code in anaconda spider you can see the output
Needs help with similar assignment?
We are available 24x7 to deliver the best services and assignment ready within 3-4 hours? Order a custom-written, plagiarism-free paper

