We are looking for a developer with experience with Tesseract OCR to develop a script to convert PDF files in text files using Tesseract. The PDF are in good quality, we need at least a 95% accuracy rate. We have about 50.000 PDFs so the script shall provide batch recognition. CPU processing power shall not be an issue.
Please find attached sample PDFs.
Please contact us if you need any further explanation. In addition hourly rate please provide an estimate of hours to conclude such task.
Script shall be in Python or Java.
A sample of PDFs can be found at: