I am an Ultimate Frisbee player who wants to run some statistical tests on previous years' data. In order to do so, I need it scraped from the web (scores.usaultimate.org/) and aggregated in a spreadsheet.
In particular, I am interested in a record of every college game from the spring 2010, 2011, and 2012 seasons.***
Each game should correspond to a row in the spreadsheet. The game should have a unique integer ID number, the winning team ID, the losing team ID, the winning team score, the losing team score, the date that the game occurred, and the type of tournament in which it occurred. For example, if Minnesota had team ID 123, Wisconsin had team ID 456, and they played in an unsanctioned game today (October 25th) with Minnesota winning 15-9, the entry for that game should read something like:
Game ID Winning Team Losing Team Winning Score Losing Score Date Type
0001 123 456 15 9 121025 0
A few notes on the format. All ID numbers should be integers. It should go without saying that no two teams should have the same ID, nor should any two games. A separate table should list team names (for example, "Williams-B") next to their corresponding ID numbers. Teams that only played in one or two of the past three years should still be included. The date should be a six-digit integer in the form yymmdd, so that January 5, 2012 would read 120105. The "type" category should either be a 0 (unsanctioned), a 1 (sanctioned, but not series), or a 2 (sanctioned and USAU series). This information can be recovered from the links "all events", "usa ultimate sanctioned" and "usa ultimate series" at the top of the webpages listed below.*** Scores should be an integer in the set [0, … , 17], but may also take the value W (win), F (forfeit) or L (loss). Any game for which scores are not one of these values (for example, if the score is left blank) may be discarded.
I would also like to receive the script you write to scrape this data, along with instructions for its use, so that I may use it to scrape data from the 2013 season.
If you are interested in doing this job, please send a copy of the spreadsheet that would result if only games from the Open Williams Turf Tournament, held January 21, 2012, were included.
I want each game from the beginning of the year (January 1) through the College D-1 championships at the end of may. To be specific:
For Open 2012 tournaments (http://scores.usaultimate.org/scor
For Women's 2012 tournaments (http://scores.usaultimate.org/scor
For Open 2011 tournaments (http://scores.usaultimate.org/scor
For Women's 2011 tournaments (http://scores.usaultimate.org/scor
For Open 2010 tournaments (http://scores.usaultimate.org/scor
For Women's 2010 tournaments (http://scores.usaultimate.org/scor