Experienced crawler4j or other Java crawler developer needed

Cancelled

Job Description

We need a good Java developer who has experience in crawler4j or other open source Java crawler framework to crawl through a website and export its products and categories to a CSV file.

Crawling needs to be efficient, and different website has different page structure, so the program needs to be modular and configurable and extensible to easily add new website to crawl.


It would be good if it can after export, compare the export file with its previous version, and generate another file that only include the change. Not sure if Apache Nutch can do that. This feature is a bonus but not super essential.

---
Skills: apache