If a human brain does anything, it’s trying to make sense of the world around it. It sees the shadows on the cave wall, and it makes a guess about what’s going on outside the cave. The human brain yearns for coherent order in the world and certainty about the future. One way we create order in the world is to literally put things in order. We make lists: to-do lists, Christmas wish lists, F/A-18 landing checklists, etc. And to gain certainty about the future, we make predictions: weather, stocks, sports, etc.
I once wrote to our own Carson Cistulli in an email that “People love lists and they love predictions. What I give them is lists of predictions.” Frankly, it wouldn’t matter how accurate the predictions are. People just crave them. The future creates anxiety, and whether accurate or not, predictions, help relieve it. At least, that’s what my astrologer tells me.
Anyway, over the past couple years there have been a growing number of misleading comments about the accuracy of the game probability model. I’ll rectify this at the end of this post, but bear with me as there are a few thoughts I’d like to share along the way.
Criticism should be expected anytime you take an unconventional approach, but there’s more to it in this case. I should first say that many critics of the model are correct, and they almost always take a constructive tone. I've learned a surprising amount from many of the comments here. But many of the critics are either misinformed or have an agenda. Having done this for several years, agendas are very easy to detect. (A dead giveaway is when they use 'extreme' modifiers and say simply, clearly and obviously when things aren't simple, clear or obvious. They will also try to bulldoze you with statistical jargon that most readers won't understand. I could write a whole article just on this phenomenon.)
Many of these guys are prediction peddlers themselves. I’m competition to them, and if my injury-unaware, super-simple, completely open efficiency model is just as accurate as they are, they lose face and lose business. I’m particularly bad for this industry because what I do is free and published weekly in the virtual pages of the New York Times’ Fifth Down. I suspect many of the agenda-bearing critics are envious. There are certainly analysts out there who could produce models better than mine, and I'm sure some of them chafe when an amateur like me gets a lot of attention. In my years of experience, I’ve noticed a certain proportion of commenters are ‘negators’--people who enjoy trying to knock people down a peg or two. They're the kids that knock over sand castles at the beach.
When I started doing football stats, I wasn’t trying to pick winners. I was trying to settle a debate at the office about the relative importance of offense and defense. I copied some efficiency numbers and team win-loss records from espn.com into Excel, and ran a regression. I then wondered if particular types of match-ups mattered more than others, such as a good running offense vs. a poor running defense, so I created another regression based on game-by-game probabilities.
Naturally I was curious how well the resulting regression model could predict upcoming games, so I fed in the current team stats (it was week 5 of 2006 I think) and sure enough, just by luck, it was 14-0 picking winners that weekend—It was an easy week. I was foolishly hooked and thought maybe I had stumbled onto something really cool. I emailed upcoming weekly probabilities to coworkers, who after a while asked me to stop spamming their inboxes with my kooky football numbers. I thought I’ll just put them up on the Web somewhere, and people can check them out if they’re interested without getting spammed. Soon, bbnflstats.blogspot.com was born. But in truth, the predictions were just a fortunate by-product of the model.
I was much more interested in the original intent of the model than its resulting predictions. I wanted to learn about the relative importance of passing compared to running. How repeatable are defensive interceptions rates? How badly do penalties really hurt a team’s record? How many wins does a 1-yard per pass attempt improvement typically represent? How random are game outcomes? So the website became focused on those questions, yet the weekly probabilities remained a fixture. They became widely read, and before long the Fifth Down approached me out of the blue to ask if they could use them on their site. I was honored to be asked, and jumped at the opportunity. I never approached them or anyone else about selling predictions.
I once tracked the model’s accuracy week-by-week. There was even a running tracker at the top of the site’s main page comparing the results to Vegas favorites. But I removed it because it became the sole focus of the site. I also found myself worrying about it too much on a weekly basis.
Fortunately, there are a few readers who have diligently tracked the performance of the efficiency model on their own and often send me updates. I trust these more than my own tracking because there are many gray areas, and I obviously have a dog in the hunt. For example, if the model calls the game as 50/50, how does that count? Is it just thrown out or does it count as a miss because it couldn’t identify the correct favorite? What if the game goes to overtime—is then it a hit? What about week 17, when all logic goes out the window? Some top teams with playoff seeds rest their starters while other teams have their bags packed for the Caribbean. Fantasy leagues are smart enough not to count week 17. Should it count against a quantitative probability model?
The other question is ‘what is the standard against which the model is compared?’ The Vegas lines are a very efficient market. It’s extremely hard to reliably out-pick the odds makers. After all, they have the most at stake and can hire guys as smart as anyone else to make sure they don’t lose a lot of money.
The Vegas favorites will sometimes vary depending on your source and on whether you look at opening lines or closing lines. Opening lines differ than closing lines for a couple reasons. Closing lines reflect the market’s opinions as gamblers vote with their checkbooks. And more importantly, they incorporate information learned throughout the week, everything from injury status to weather forecasts, and they are almost always slightly more accurate.
I choose the opening lines as my benchmark for my model for a couple reasons. First, I can press a button and get the upcoming game probabilities as soon as the games are over the previous Sunday. The information produced by the model is available (to me at least) before the opening lines are published. And second, like the opening lines, the model has no access to injury information released during the week. In fact, it gets no injury information at all beyond the past team stats it uses.
Honestly, I think the standard of the opening line is still unfair. The model didn’t know that Ben Roethlisberger returned from a suspension to take over for Charlie Batch last season in week 5. Right now it still thinks Kenny Britt is playing for the Titans. It didn’t know that the Colts had locked up their playoff seed and that the Peyton Manning show suddenly became the Jim Sorgi show. But the oddsmakers do.
And so do I, which spurs email asking me why I don’t fiddle with the numbers to reflect these known factors and improve the model’s accuracy. There’s no single good answer. Truthfully, I’m interested in spending my limited football research time in more useful ways than eeking out the very last percentage point of accuracy. I’m sorry to those who have relied on my model that I am now much more interested in providing the kind of analysis that makes a difference on the field rather than in pick ‘em leagues. Additionally, the model is based on the principles of simplicity, objectivity and transparency, all of which would be compromised by meddling with the inputs. And lastly, the model serves as a reference point from which we are all free to make intuitive adjustments based on injuries or other factors.
There are a lot of ways to judge the performance of a probability model. The best approaches are similar to a mean squared error approach, where the error is 1 minus the probability forecast for the winner of each game. There's also the issue of calibration, meaning that when the model says a game is 70/30, it's actually wrong 30% of the time. Since corrections made following the beginning of the 2007 season, the calibration has been solid and is not really in question. That said, most people don’t care about that stuff. They care about winners and losers, so that’s what I’ll present.
The following table lists the percentage of winners predicted, and is based on independent information sent to me by a trusted reader. Personally, I would usually be much more forgiving, but I have deliberately chosen reasonable but unfavorable assumptions regarding the scoring on things like 50/50 games. Week 17 is included. Playoff games and Super Bowls are also included. For comparison, I've included the opening line accuracy over the identical weeks and periods.
Everyone is entitled to their own standard, but all things considered, I am very pleased. This simple efficiency model, which knows nothing about injuries to left tackles or suspensions to franchise quarterbacks, has performed as well as the collective wisdom of virtually every football expert. In its four years of existence, it has been among the top performing prediction sources, something even its harshest critics would have to admit. Keep in mind, the very best system will rarely be number one in any given year. There is too much luck involved in the short NFL season. The most sound system is probably the one consistently near the top, and never near the bottom.
It has also outperformed other notable quantitative models since its inception, including both of Jeff Sagarin’s, which have long been considered the gold standard of quantitative rankings and predictions. It beats Pythagorean models, point differential models, Elo models, and a slew of others. Be aware, many of the systems that call themselves "computer models" are often just averages of Vegas lines or other prognostications.
There's a lot of luck involved, and it's even more than you'd think. There are only a handful of games per year on which the model and the lines disagree, and these games are logically the ones closest to 50/50. So the sample size for a comparison in any given season isn't 240 games or so. It's closer to 40 or so.
So the next time someone leaves a comment like, "It's extremely obvious that Brian's model is simply not very accurate," don't take the bait. Just do what I'll do and link to this post.