Web Data scraper or Data miner for a complex product data extraction project
Description : We are looking for a web scraping or web data extraction developer. We will be providing a set of websites and we need specific set of data extracted from those websites into a table format. Data of interest can include - Product information, UPC or product codes, product manuals, warranty details, customer care numbers etc.We might also need some refinement of the unstructured data that you might extract to eventually put it in a structured tabular format. To start with we will provide you a data table template, and we can discuss further as to the practicality of extracting data.
Deliverable : (able to run on Linux)
1) Web scraping script/software you developed to create the Initial data set
2) Monitoring modules to continuously update the data sets
3) Scalable solution to expand the web scraping to other websites/brands.
Specific Skillsets : Not all of the following skillsets/toolsets below are mandatory, but we prefer that you deliver the scraping solution using the following toolsets.
1)Web scraping scripts in python or perl or ruby (you can choose a scripting language of your choice)
2)Toolsets : BeautifulSoup or Webscraping or Scrape.py or Xpath or Nokogiri or Mechanize in the ruby world
3)HTML parsers : lxml or HTMLParse or html5lib
You can use the above in conjunction with existing toolsets like YQL or any commercial toolsets like Connotate, Mozenda, Newprosoft or Insypder etc
Launch Date : Immediate
Delivery Date : Dec 15th
Quality of work : Low error rates of aggregated data and the software solution should be scalable to other product brands