Less than 1 month –
Less than 10 hrs/week –
I have a list of websites to crawl. I'd like to build a web crawler that extracts news articles from those sites and looks for a certain regexp on them.
The crawler is supposed to run automatically every day to gather new articles. It has to support all the robots.txt requirements and use sitemaps if possible.
For this job I need someone who already worked with web crawlers.