OCR and text extraction from PDF

OCR and text extraction from PDF


Job Description

I would like a utility that could take a PDF (see attachment - first two pages only) and generate a text file with the results of each food section, with foods listed in the same order as they appear in the PDF as well as indicating if the color is green, yellow, or blue after each food.

Note that the foods are always the same, so OCR accuracy could be dramatically enhanced by using a vocabulary that is restricted to the foods listed.

My ultimate goal is to create a web service which would allow upload of PDFs for processing in this way, but the first step would be a script that could run on a Mac or a PC as proof of concept.

I have no set budget for this yet, as I would first like to get some preliminary info on how hard it might be.

Skills: pdf

Open Attachment

Other open jobs by this client