PDF Data extraction, Anki SRS, Excel, Java, Visual basic, HTML

Closed - This job posting has been filled and work has been completed.

Job Description

This project will involve two parts. Preference will be for someone who will be able to complete both parts for continuity sake.
I am also interested in other small tasks specified in the "Other Skills" section.

Specify whether you would like to work on:
-Part 1
-Part 1+2
-Part 1+2 and Other
-Other

Part 1
I have spent some time trying to extract captions from PDF files using regular expressions and extracting tools in Acrobat X, but do not have a firm enough handle on PDF structure in order to extract the text consistently. I have several medical textbooks for which I would like to extract the images, corresponding captions, and create either an excel or csv file containing the captions and image file names for my personal study. We would start with a single 900 page textbook extraction and work forward from there.

Part 2
Search and Export Data/Image pair by phrase Excel VB macro / java/ etc...
I would like to be able to type a phrase, such as 'Penicillin' and generate a CSV file with the following structure:
{Penicillin, img1file, Img 1 caption, img2file, Img2caption, etc....}
These will be for import into a flashcard program (Anki) from the initial database.

--------------------------------------
Other small projects I am interested in working on would involve skills such as:
**Anki 2.0 SRS template creation
HTML
Website/HTML data extraction
Javascript
SDK development
SQLite

---
Skills: pdf, import