Less than 1 week –
Less than 10 hrs/week –
1. Need a script that will pull content from website. 1 individual page per execution.
2. Specific parts of the website are required, title, description, meta tags, page content, images, ical times etc.
3. The script will generate an xml/json file representing the website and store this on amazon s3 or alternatively populate a nosql data store with the scraped content.
4. The script needs to be able to manage delta's, so check if an extract has already ...