Data Scraping / data extraction of news articles from RSS
The goal of this project is to create a script (prefer PHP but will consider any) that runs continuously while analyzing an RSS feed, visiting the links in the feed, and extracting the entire article contents, including:
Subheading (if it exists in the article)
Date and Time published
Image source (name and outlet. For example, "James Smith/AP")
Author bio (if it exists)
The purpose for this is that I need to mock up a news site and need fresh news stories.
Here is the feed to analyze:
The data needs to be inserted in a MySQL database.
Other open jobs by this client
- Hourly – Magento whiz needed FAST