Predicting Covering NCAA Football Spreads

EECS 349 Machine Learning Final Project
Professor Doug Downey
Northwestern University

Matt Hinger
matthewhinger2007@u.northwestern.edu

Motivation Solution Testing/Training Key Results Future Work Final Report and Detailed Results

Motivation:
The goal of this project is to develop a method for accurately predicting whether or not a college football game would cover the Las Vegas spread based on a variety of statistical metrics. Given the amount of money involved with college football in the US, it could be very beneficial to know whether or not a game would cover the spread. However, the main motivation for this project is the fact that I find college football so fascinating. I wanted to determine if certain aspects of the game, whether it be a teams ranking, average points scored, their average offensive/defensive statistics, could have more influence in the outcome than is evident. Or if college football is in fact random. Either way this project would be able to provide the insight that I desired.

Solution:
The large amounts of data and statistics that I collected would be useless without the ability to sift some meaning out of it using various machine learning algorithms. For this project, I decided to focus my attention on the following learning algorithms: J48 Decision Tree, Random Forest, Bayes Net and Rotation Forest.

The features that I focused my attention on were the following:

Week of Season

Neutral Site

Las Vegas Line

Anderson and Hester Rankings (Only for certain attempts)

The following statistics for both teams:

Offensive Rush Attempts, Offensive Rush Yards, Offensive Pass Attempts, Offensive Pass Completions, Offensive Pass Yards, Offensive Interceptions, Offensive Fumbles, Defensive Rush Attempts, Defensive Rush Yards, Defensive Pass Attempts, Defensive Pass Completions, Defensive Pass Yards, Defensive Interceptions, Defensive Fumbles, Points Scored

For the statistics I chose to do running averages for the entire season so as to not bias the results if a team happened to play an easier opponents. For example game 1 of the season there is no data for the teams other than the site of the game, so it is basically a guess. For game 4 of the season, the statistics would be the average up to that date.

Testing/Training:
For training I used a data set that was comprised of all the NCAA Football Games between two FBS opponents from the 2009 season. Which came out to 679 games. For testing I used a data set that was comprised of the games from the 2010 season. Which came out to 682 games. I felt that this was a very appropriate amount of data examples to use for training as well as testing. I defined success for this project to be whether or not the spread was covered. So a success would be the spread was covered in a given game and a failure would be that the spread was not covered.

Key Results:
I had a lot of success with this project. I came up with a set of 8 models, which I used 4 separate times. Each time I chose different features and subset of data to use. My first attempt was looking at every week and all features (except Rankings). Attempt two looked at the same as the first attempt except without week 1 games. Attempt three looked at the same data set as attempt two but added the computer rankings as well. Attempt four looked at a subset of the data from attempt three, but only consisted of weeks 6-14. From these results, attempts three and two performed the best on the test data set. Both attempts had models achieving over 50% of games classified correctly. The best model was a J48 decision tree model from attempt 3, which achieved 51.4774% of games classified correctly. Anything over 50%, in my opinion, is excellent, because the spread is designed to be a value that yields a 50/50 outcome of the game. All of the best performing models, regardless of attempt, were J48 decision trees. Both the Bayes Net and the Rotation Forest achieved performance on par with ZeroR at best. After looking at the decision trees of the top performing models, the top deciding features were: Week, followed by visitor fumbles (both offense and defense).

Percent Correct for Various Models on Test Data
Phase1 - All Base Features, All Weeks:
ZeroR	46.4809%
J48 -C 0.25 -M 2	49.7067%
J48 -C 0.15 -M 3	48.0938%
J48 -C 0.2 -M 2	49.2669%
Random Forest -I 10 -K 0 -S 1	45.3079%
Random Forest -I 25 -K 10 -S 1	49.4135%
Bayes Net	46.4809%
Rotation Forest	46.4809%
Phase2 - All Base Features, No Week 1 Games:
ZeroR	46.1897%
J48 -C 0.25 -M 2	49.7667%
J48 -C 0.15 -M 3	51.0109%
J48 -C 0.2 -M 2	49.7667%
Random Forest -I 10 -K 0 -S 1	46.9673%
Random Forest -I 25 -K 10 -S 1	49.4557%
Bayes Net	46.1897%
Rotation Forest	45.5677%
Phase3 - Phase2 plus Computer Rankings:
ZeroR	46.1897%
J48 -C 0.25 -M 2	51.1664%
J48 -C 0.15 -M 3	51.4774%
J48 -C 0.2 -M 2	51.1664%
Random Forest -I 10 -K 0 -S 1	48.056%
Random Forest -I 25 -K 10 -S 1	47.2784%
Phase4 - Phase3 but only weeks 6-14:
ZeroR	46.7562%
J48 -C 0.25 -M 2	48.0984%
J48 -C 0.15 -M 3	48.5459%
J48 -C 0.2 -M 2	48.5459%
Random Forest -I 10 -K 0 -S 1	44.0716%
Random Forest -I 25 -K 10 -S 1	47.4273%

The above table gives a view at the varying effects each model had on the different attempts. Overall best result is in Purple. Best result per attempt is in Blue, results equal to ZeroR are Green and results worse than ZeroR are in Red

Future Work:
Due to the fact that I achieved promising results on multiple models from various attempts, the logical next steps would be to continue tinkering with the features that are used. Inclusion of aspects of the game such as injuries, star players, weather, recruiting, coach record vs. the team, amount of junior/senior starters, etc. I am also going to look at the 2010 bowl games and run my best performing models on the games and see how they perform.

Report
Matt Hinger Final Report

Appendix
Detailed Results

Top