OCR Tesseract expert to convert many file types emailed in


Job Description

Emails will arrive with attachments. Convert them to text. File formats that will arrive as attachments include: PDF, doc, docx, png, jpg, tif, gif.

You will upload the generated text file to amazon s3 and post to a URL the name of this file.

U will extract the emails from a google apps email account or one pointing at the server. Probably a google apps email account. So u will write a script that will constantly check for emails and delete them after processing them. U can use cron every minute or use a constantly running script.

Skills: pdf, amazon