OCR Tesseract Expert Needed for Long Term Job
Closed - This job posting has been filled and work has been completed.
Below is Phase1 of the project, this is a project that will go on for more than a year... there are 5 phases.
Individual Contractors only, companies will be directly REJECTED.
Candidate preferably in GMT +2 timezone
The system uses tessercat to OCR scan newspapers and magazines (printed
publications), hence when a user searches for a certain keyword then the system is
able to locate it (English and Arabic) if it is present on the newspaper or magazine.
Newspapers and magazines are OCR scanned and fed to the system. The goal is to
achieve the below:
1- The engine is working however it is unable to locate keywords that are
present in small articles, for instance if the keyword is present as a section
headline , page title, article title, etc having a bigger font then the system
is able to track it. However if it is present within the article body text
sometimes the system is unable to track it.
2- Right now, when the system finds the keyword within the publication it
will only give you access the page of the publication however it is unable to
highlight the keyword on the page. We need to make it capable of finding the
keyword and highlighting it wherever it is present on the page.
3- We need to be able to download PDF versions of the newspapers and
magazines whenever they are available online automatically (after
predefining the URL links to those PDF versions) to avoid the manual effort
required to scan every page.
After retrieving the keyword in the newspaper or magazine, what
After retrieving the results in the search listing page, the system should be able
to classify the results by Country, Magazine Name, and Section. In addition to the
frequency of mentions ( the number of times this keyword has been mentioned in
the same newspaper or magazine).
So by default the search results page will display the list of all results, then we will
have dynamic filters on the left side to filter the results by country, Magazine/
Newspaper name, frequency.