In this project, we use the existing fast NN search methods such as FLANN (with publicly available software).
The task is to empirically evaluate and summarize: for different datasets with different characteristics, how to set the parameters of FLANN to get the best tradeoff between accuracy and time.
So basically, the dataset can be sparse or dense; high or low dimensional; with limited range or large value range, etc. In these different situations, which parameter (which type of tree, tree parameters, etc.) should be used? -- if you want to get 99%, 90%, ... different level of accuracy etc.
The data can be downloaded from here,
The website is http://mldata.org/.
You can use the following dataset: MNIST (original); SensIT Vehicle (combined); poker; Jester 1; ASAP_toy;
You can use the search function in this website.
You can download the Matlab version of the data or the HDF5 version. The HDF5 format can be directly used in FLANN; the Matlab version is easy to convert to other format.