Machine Learning / Statistical Analysis needed.

Closed - This job posting has been filled and work has been completed.
Fixed Price
Deliver by - October 21, 2012


I have 12 dependent variables (columns B-M) with 86 independent variables (columns N-CU) over about 12 years.

I want so see the degree to which any combination of independent variables predicts any combination of dependent variables. There could be a direct correlation. There could be a 1-2 day lead or lag. What combination of the indep. variables has some degree of predictive power over the dep. variables?

I have done simple correlation in Excel. The next step is some form of learning algorithm (either developed yourself or using a tool such as Weka). The first step is eliminating the essentially redundant (i.e., highly correlated) Independent Variables.

Overall, i need to be able to understand the process enough to re-create it myself as this is just an exploration of what might be meaningful.

Attached are twenty files. 1-10 and 11-20 are identical except that the second set fills in gaps in the data with data from the previous day. Therefore, no spaces/gaps in data. The distinction between the ten files themselves (with or w/o spaces) are different cuts that matter to me. (e.g., in File 1&11, I only include meeting minutes where in 2&12 I add speeches).

I have attached all the files. They are also available here: