Est. Budget: $30.00
The CiteSeer UMD collection is a standard text document collection, consisting of abstracts of research articles from Computer Science, which are sampled from the CiteSeer digital library. The dataset is available for download from moodle.
NOTE: i have attached the citeseer umd collection to this job. I will have more than enough jobs for you if you could do it.
1. Write a program that preprocesses the collection. This preprocessing stage should specifically include a function that tokenizes the ...