The basic goal is to create a script to help me run some Russian text through a variety of document classifiers, all of which requires a matrix of one form or another. A 24-48 hour turnaround time would be greatly appreciated. Its an easy job. The steps that need to be automated are: Input: 8,000 documents in two file formats: Step 0: Dealing with File Formats: There are two file formats in the documents. One which starts in the wrong encoding and needs to be Cyrillic...