Machine Learning / Statistical Analysis needed.
Closed - This job posting has been filled and work has been completed.
I have 12 dependent variables (columns B-M) with 86 independent variables (columns N-CU) over about 12 years.
I want so see the degree to which any combination of independent variables predicts any combination of dependent variables. There could be a direct correlation. There could be a 1-2 day lead or lag. What combination of the indep. variables has some degree of predictive power over the dep. variables?
I have done simple correlation in Excel. The next step is some form of learning algorithm (either developed yourself or using a tool such as Weka). The first step is eliminating the essentially redundant (i.e., highly correlated) Independent Variables.
Overall, i need to be able to understand the process enough to re-create it myself as this is just an exploration of what might be meaningful.
Attached are twenty files. 1-10 and 11-20 are identical except that the second set fills in gaps in the data with data from the previous day. Therefore, no spaces/gaps in data. The distinction between the ten files themselves (with or w/o spaces) are different cuts that matter to me. (e.g., in File 1&11, I only include meeting minutes where in 2&12 I add speeches).
I have attached all the files. They are also available here: https://www.dropbox.com/sh/8r330vs