This project is to develop Markov Models that can be used to analyze log file data. In particular, the log file data is the click stream data from visits to web sites.
The resulting models need to:
1) Develop canonical visitor profiles -- Be able to group site visitors into canonical visitor groupings based on their on-site activity, referrer source, and other data - e.g. the top 10 canonical customer profiles
2) Based on the model, be able to predict given only a subset of data which profile a customer will belong to
Source data will include web site log files and may include other data such as data from google analytics and other systems
The resulting model needs to be able to run on a Linux system and take as input new log file (and other data), auto categorize visitors (based on IP, cookie, or other identifier) into a customer profile group.