Develop a URL Snapshot service, and Image Repository using node.js

Closed - This job posting has been filled and work has been completed.

Job Description

The project is to build a URL Snapshot Service And Image Repository

There are three major components:

1. URL Snapshot Service, to create snapshots of URLs (node.js)
2. Image Repo, to cache images created by the snapshot service (node.js)
3. Java client, to access the Snapshot Service (Java)

Part 1: The URL Snapshot Service

The service takes as input a URL, screen resolution, and scale factor. It will render the URL as viewed at the supplied screen resolution, and scale the image based on the scale factor.

It will then name the image using a SHA-1 hash of the URL, and save the image to the specified repo directory. The return value will be a URL to the image as served by the Image Repo.

The service should maintain a timestamp of every snapshot request, per URL/resolution. When the service receives a request, it should first check the most recent timestamp. If the age of the latest request is less than the TTL, it should return the existing URL to the most recent image, rather than taking a new snapshot.

If there is an error rendering the URL, or the URL is invalid, the service should return a link to a “render error” image.

Simultaneous calls to render the same URL should be synchronized, such that the first call will render the URL, and subsequent calls will not.

The service should log requests/failures to keen.io

Part 2: The Image Repo

This is a basic HTTP server that serves the static images from the repo directory. It includes a daemon process to clean images that are older than the specified TTL.

It should log cache hits/misses, as well as cleanups, waits (when multiple calls to the same URL are queued), as well as the size of the image cache, to keen.io

Both the snapshot service and image repo should run within the same node.js context.

Part 3: The Java Client

A lightweight Java client must be written, to call the URL snapper service.

Example client:

// create a new client
SnapshotClient client=new SnapshotClient(“localhost:7910”);
// take a snapshot of google.com as viewed at 1280x1024
// and reduce to an image half that size
String imgUrl=client.snap(“http://google.com”, 1280, 1024, 0.5);

The client should return something like this:
http://imgrepo.com:8080/234988566c9a0a9cf952cec82b143bf9c207ac16.jpg

Example service startup:

node service.js snapshotPort=7910 repoPort=7911 repoDir=/img extRepoAddress=http://imgrepo.com:8080 ttl=259200000 keenProjectId=someId keenWriteKey=writeKey keenReadKey=readKey

This should start the snapshot service to listen on port 7910. The image repo will listen on port 7911. When the snapshot service constructs the return URL, it uses extRepoAddress as the base of the return URL. This is because the repo will be accessed from the WAN, and WAN the address will be different from the local address.

Helpful Links:

// render image from URL
https://github.com/brenden/node-webshot
// fasht sha-1 algorithm
https://code.google.com/p/tiny-sha1
// listening on multiple ports
http://stackoverflow.com/questions/15098823/using-node-js-to-listen-on-2-different-ports
// creating a static file server with in-memory cache
https://github.com/cloudhead/node-static