Web Crawler Jobs

35 were found based on your criteria

  • Hourly – Less than 1 month – 30+ hrs/week – Posted
    WebCrawler Implementation - Utilize PySpider Crawler based on Scrapy Web API - Maintain a Index of Crawl Status - Integrate with Zookeeper to monitor the state of the crawl nodes - Integrate with Kafka Queue to dump crawled results The PySpider is a opensource project implemented in python and it has very active developer community, so far we have identified it as one of the best solutions out there. Developer Tasks include: 1) Create a vagrant box that can be started on demand and ...
  • Hourly – Less than 1 week – 10-30 hrs/week – Posted
    Distributor Search Project over www.medicalexpo.com Objective To identify new distributors for S·CAPE. Methodology A. Manufacturer research Start on www.medicalexpo.com and identify manufacturers of equipment, devices etc that are related to S·CAPE´s product offering. The relevant product categories on medicalexpo.com to research have been identified and defined in a product category xls sheet. Eliminate doubles and Veterinary and Dental equipment manufacturers Delivarable A: List of manufacturers by main category with website addresses B ...
  • Hourly – 3 to 6 months – Less than 10 hrs/week – Posted
    We are looking for someone to source products for our Amazon business by comparing "buy" prices from online retailers to "selling" prices on Amazon's website. The leads you find will be plugged into an Excel spreadsheet and passed along to us. Please review the attachment for an example of our excel spreadsheet. If you feel you qualify, please include an Excel spreadsheet showing some of the work you have done. We are looking for people with specific experience sourcing ...
  • Hourly – 1 to 3 months – Less than 10 hrs/week – Posted
    1. Design a web scraping device to go to a web site and obtain all of their store locations. 2. Once we have the store locations- this information will be used to go to a county's website to obtain the name of the ownership entity. 3. Once we have the ownership entity- need to go to a secretary of states website to get the address of the ownership entity.
  • Hourly – Less than 1 week – Less than 10 hrs/week – Posted
    Hi, I am currently working on a background service website, however it uses an external service. For reasons of speeding up the website and cut costs of retrieving the data I would love to have a crawler of some kind which can fill my database with people in the USA + background info. Please give me advice on this and tell me how you would retrieve the data. Thanks, Johan
  • Hourly – Less than 1 week – Less than 10 hrs/week – Posted
    Purpose is to get personal contact information (Leads) from recruiters, especially from companies currently hiring. A. Instruction ----------------- I. Company should fulfill the following: 1. Companies does not use a online management tool. A hint could be if there is not link create a profile for the candidate. 2. The Company should be German speaking, thus having a location in Germany, Austria or Switzerland II. Personal contact information shall include at least: 1. Source of contact information, e.g. stepstone.com ...
  • Hourly – Less than 1 week – Less than 10 hrs/week – Posted
    Looking for someone with strong skills in website scraping and preferably also some experience using ElasticSearch to create a cloud based web scraping solution to regularly scrape specific sites and put the normalised data into cloud hosted ElasticSearch instance. Initially thinking of using Scrapy (Python) hosted on ScrapingHub but would consider other cloud hosted scraping alternatives. Initial requirement for 10 sites but there are plans for a much larger number.
loading