Scraping Agricultural Census of India


We are looking to scrape some data from the following website about pesticide use in Indian agriculture:

The website provides pdf reports from the agricultural census of india. There are three waves of the census available online 96/97, 01/02 and 06/07.

For each wave of the census you can query the database to give you a pdf file that tells you the amount of pesticide used by farmers of a landholding size (small, medium, large) for a certain crop in a given district in a given state (Table 5H, Usage of Pesticide for Different Crops).

For example, I have attached the pdf file the database generates for the 96/97 wave of the census for Surendranagar district in Gujarat state.

Each page of the pdf file is the table for a given crop for that district, in that state. We would like to capture all the information in this table and put it in to a csv file. I've attached the format in which the data is needed (9697-gujarat-bharuch.csv). This contains data for the first two pages of the attached pdf file.

We would like to do likewise for all districts in all states for all three waves of the census. I have some code that was written in python that was used to scrape a different part of this census -- which produces pdf files in the same manner -- and processes it in to csv files, which can be easily adapted to scrape this data.

The applicant should have considerable experience with scraping data -- preferably using Python.

Fees and time are negotiable. Upon successful completion of this project, there will be an opportunity to complete similar scraping exercises.

Please let me know if you are interested and have any questions.


