Website Data Extraction Multiprocessing, Multithreading - Python

Website Data Extraction Multiprocessing, Multithreading - Python

Cancelled

Job Description

The ideal candidate will be able to access a given website and extract thousands of links. Next, the candidate will be able to create a software that can iterate through the thousands of links in an extremely efficienct manner, near a rate of 20 links per second. Finally, due to server limitations, the script should be able to incorporate proxies, user agents, cookies, and/or ip address changing to allow the most optimal and efficient parsing through links. The candidate will be able to write such a script in Python.

Before applying, please respond to answer the following:
1. general design of how the script will be written
2. estimated time need to complete script and rate/hour or estimated fixed-price cost for assignment.
3. general description of plans for writting script. Which modules will you use. Which environment you plan to write the script in. What difficulties or challenges do you anticipate in writting this script and how you will overcome them.
4. Past work experience or projects written in Python that you feel will make you an desireable candidate for this position.

Here is a simple example of the serial process that needs to be multiprocessing and made robust.

link_list = [link_1, ..., link_n, link_n-1], where n >= 10,000
for link in link_list:
SomeFunction(link)

*Additional instructions will be given upon contacting us to clarify the exact specifics of the job.

---
Skills: multithreading, design