Est. Budget: $500.00
This project entails two major components:
1. The development of a server-side web page indexing service (using DIFFBOT.COM API)
- The service will query the database for new entries (URLS) that it will then seek to web-crawl using Diffbots API in batches.
- The data that is fetched should be indexed in the existing database, including:
- Publication name
- Article Title
- Article Author
- Article Content Text
- Article Image (multiple) URLs
- Timestamp (date published)
2. Indexing images (downloading onto our own ...