Need is to have existing documents Naturally Language Processed (NLP’ed, if you will), analyzed and converted into a semantic format. The format needs to be created in such a manner that it can be placed into a sort of query-able database from which queries can be launched in order to pull in publicly available information (Twitter feeds, Blog APIs, etc.)
Documents tend to be PDFs, some are text based, others image based, but can be OCR’ed into pure text if needed.
Experience with semantic analysis and related machine learning is paramount. Experience pulling in publicly available information is desired/a plus.
Applicants please describe what requirements you would need to execute, how you would approach the problem, what programming language (and why), what the output and what additional tools/resources/libraries you would use.
Any estimates on how long the semantic analysis code would take are welcome (i.e. how many hours to program.) Please describe prior, similar jobs, and what the hurdles might be.