Machine Learning / Statistical Analysis needed.
Closed - This job posting has been filled and work has been completed.
I have 16 dependent variables (columns B-Q) with 86 independent variables (columns R-) over about 12 years.
I want so see the degree to which any combination of independent variables predicts any combination of dependent variables. There could be a direct correlation. There could be a 1-2 day lead or lag. What combination of the indep. variables has some degree of predictive power over the dep. variables?
I have done simple correlation in Excel. The next step is some form of learning algorithm (either developed yourself or using a tool such as Weka). The first step is eliminating the essentially redundant Independent Variables that I have identified. (In the dropbox folder, you will see file 'res-dependent.txt' which lists all of the correlated files by group. The files in the 'results' folder show those columns with constant IVs so may also be eliminated.
Data source: I have taken qualitative data (news, of a sort) and used a tool that takes different measurements (the IVs)
I want to compare this to financial data. The news may lead the markets or the news may lag the markets.
Critically, I am NOT looking for a trading strategy today (and maybe never). This is more of an exploratory exercise at this point. I would love predictive value, causality, but really I am just looking for correlation. Are they even related? Said another way, is my tool even valuable? Do the measurements that have produce anything of value if you were to use them to interpret what is going on? The financial markets are just a terrific unambiguous comparison variable set.
Overall, i need to be able to understand the process enough to re-create it myself as this is just an exploration of what might be meaningful.
Attached are ten files, zipped.
I have attached all the files. They are also available here: https://www.dropbox.com/sh/8r330vs