Est. Budget: $10,000.00
I'm looking for someone to modify and combine open-source software, such as Tesseract and OpenCV, to give me a Linux CLI application for layout detection and OCR for scans of old books.
Detect and remove noise
Deskew and autorotation
Detect coordinates of text, and their OCR results (Tesseract can be used for OCR)
Detect coordinates of pictures
The application must be fast and efficient with minimal IO. It can use as much RAM as ...