Data Mining

Closed - This job posting has been filled and work has been completed.

Job Description

We need a programmer to write a script that is able to crawl http://answers.yahoo.com/ and extract the following information:

1. A list of questions that are asked on the website (originating from USA).
2. Date that the question was posted.
3. If the question has been answered then also get number of answers to the question.
4. If the question has been 'starred' or 'favorited' then also obtain the number of favorites to the question.

The script should preferably be in Linux shell or python. The result should be in the form of a text file with four tab separated columns corresponding to the four pieces of information listed above. It is possible that for many questions column three and column four will be empty. The script should be able to grab all questions that were asked (originating from USA) in the last 4 days.

We don't need you to run the script; we just need the script.