In the interest of full disclosure, or for those fellow uber-geeks, here is the actual model I'll be using for estimating outcome probabilities for each NFL game.
It is a logit regression model based on the outcomes of all regular season games in 2002-2006. I looked at each game twice, once from the point of view of the home team and from the point of view of the visiting team. I called each team, Team A and Team B. To identify the home team, I used a dummy variable AHome, which was 1 or 0 depending on whether Team A was home or away. The dependent variable is AWon, which is 1 if Team A won or 0 if Team B won. There were 2560 cases (games) considered.
|VARIABLE ||COEFFICIENT ||STD ERROR ||T-STAT ||SLOPE at mean|
|const ||-0.26 ||1.36 ||-0.19 |
|AHOME ||0.74 ||0.09 ||8.29 ||0.19|
|AOPASS ||0.45 ||0.07 ||6.56 ||0.11|
|AORUN ||0.27 ||0.10 ||2.65 ||0.07|
|ADPASS ||-0.54 ||0.09 ||-5.90 ||-0.13|
|ADRUN ||-0.21 ||0.11 ||-1.87 ||-0.05|
|AOINTRATE ||-15.90 ||6.26 ||-2.54 ||-3.98|
|ADINTRATE ||17.68 ||5.16 ||3.43 ||4.42|
|AOFUMRATE ||-20.50 ||7.79 ||-2.63 ||-5.12|
|APENRATE ||-1.49 ||0.72 ||-2.07 ||-0.37|
|BOPASS ||-0.45 ||0.07 ||-6.54 ||-0.11|
|BORUN ||-0.27 ||0.10 ||-2.64 ||-0.07|
|BDPASS ||0.53 ||0.09 ||5.83 ||0.13|
|BDRUN ||0.20 ||0.11 ||1.79 ||0.05|
|BOINTRATE ||15.71 ||6.26 ||2.51 ||3.93|
|BDINTRATE ||-18.95 ||5.16 ||-3.67 ||-4.74|
|BOFUMRATE ||21.01 ||7.79 ||2.70 ||5.25|
|BPENRATE ||1.47 ||0.72 ||2.04 ||0.37|
Retrodictively, the model predicts 69.5% of the games correctly. But keep in mind there are many evenly matched games and upredictable upsets, so it may be impossible for even the most perfect model to get past 75% or so.
I realize the numbers in the table above are meaningless to most people, but I want to ensure everything I do is out in the open.
OPASS = (offensive pass yds - sack yds) / pass plays
ORUN = offensive run yds / run plays
DPASS = (defensive pass yds - sack yds) / pass plays
DRUN = defensive run yds / run plays
OINTRATE = offensive interceptions / pass attempts
DINTRATE = defensive interceptions / pass attempts
OFUMRATE = fumbles / offensive plays
PENRATE = team penalty yds / total plays
The t-stat indicates the significance of each variable. For this sample size, a t-stat of approximately 1.8 or greater (or -1.8 or less) indicates a signifcance level of p=0.05 or better.
Below is a graph of the spectrum of game probabilities divided ramdomly into two sets, training cases for the regression, and test validation cases. The graph is of the actual outcome rates vs. the model's predicted probabilities.