Est. Budget: $1,000.00
I'm looking for an Apache Nutch specialist for the following project.
I want to be able to run multiple Nutch crawls on different domains simultaneously and fast without crashing the remote sites.
To start a crawling job we would pass a configuration file to Nutch that would include.
For each URL crawled we would want the header retrieved and saved to a txt file in a directory created using the ...