Kevin Meers is the Co-President of the Harvard Sports Analysis Collective. He is a senior majoring in economics with a statistics minor, and has spent the past two years or so as an analytics intern in the NFL. He is currently writing his thesis on game theory in the NFL, and probably puts too much thought into how the perfect fantasy football league would be structured.
The coach’s challenge is an important yet poorly understood part of the NFL. We know challenges are an asset, but past that, we do not have a good understanding of what makes a good challenge or if coaches are actually skilled at challenging plays. This post takes a step towards better understanding those questions by examining the value of the possible game states that stem from challenged plays.
To value challenges, we must understand how challenges change the game’s current state. When a play is challenged, the current game state must transition into one of two new game states: one where the challenged play is reversed, the other where it is upheld. These potential game states are the key to valuing challenges.
Let’s look at a concrete example from last season. With two minutes and two seconds left in the fourth quarter in their week ten matchup, Atlanta had first and goal on New Orleans’ ten-yard line. Matt Ryan completed a pass to Harry Douglas, who was ruled down at the Saints’ one-yard line… only Douglas appeared to fumble as he went to the ground, with the Saints recovering the ball for a potential touchback. When New Orleans challenged the ruling on the field, the game could have transitioned into two possible game states: Atlanta’s ball with second and goal on the one, or New Orleans’ ball with first and ten on their own 20 yard line. If the Saints lost the challenge, they would have a Win Probability (WP) of 0.28, but if they won, their WP would jump to 0.88. This potential WP added, which I refer to as “leverage,” is key to valuing challenges. Mathematically, I define leverage as:
Given this definition, the leverage for the Saints’ challenge last year was 0.6, making it the most leveraged challenge of the 2012 season.
By examining the leverage of each historical challenge, we can begin to quantify the value of challenges. Because turnovers and scoring plays were not automatically reviewed in seasons before 2012, the distribution of leverage would look much different from leverage last season and so far this season, since coaches used to challenge touchdowns and turnovers, which are very high leverage plays. For this reason, I’ve restricted this analysis to the 2012 season.
The most interesting takeaway from this graphic is how many challenges have almost zero leverage. Over 15% of challenges last season had no leverage - in other words, the team challenging the play should be indifferent between the challenge succeeding and failing. On these plays, the coach spent a challenge and risked losing a time out for no potential benefit, which would be highly illogical. With such an odd result, I went back to look at these play descriptions, and found three notable patterns.
First, about half of these challenges occurred in game states where the score differential was over 14, and it is hard for non-scoring plays to significantly change win probability in those cases. Of the remaining zero-leverage plays, about half were challenges that would bring the offense from “close to their opponent’s end zone” to “a bit closer to their opponent’s end zone,” which doesn’t really help the team’s WP very much (or at all).
Neither of these reasons makes the decisions to challenge those plays better, but the last reason I found might take some decisions off the hook. Many of these zero leverage plays involved keeping a highly efficient offense on the field (or getting it off the field, depending on who challenged the play) by challenging the result of a third down play. For example, in the infamous matchup between the Packers and the Seahawks last season, Green Bay challenged a third down measurement to keep Aaron Rogers and Co. in the game. In cases like this one, the WP model may not fully account for the specific in-game strengths and weaknesses of the teams involved (since it is based on an average team). Therefore in some circumstance, challenges might appear to have no leverage because of our WP estimates.
Does Leverage Affect a Challenge’s Success?
We can also look at the distributions of successful and failed challenges by the challenge’s leverage. If the distributions are meaningfully different, it could tell us something about which challenges are more likely to succeed than others.
This plot shows that very low leverage challenges succeeded more often than higher leverage challenges. In fact, after leverage reaches about 0.15 WP, challenges became much more likely to fail, and it seems at least possible that officials are less likely to overturn plays that are hugely influential on the outcome of the game. The results of a logistic regression lend some support to this theory, with a negative coefficient on leverage with p = 0.07. However, this result is entirely driven by a handful of failed high leverage challenges from one year of data, so I’m hesitant to declare that this relationship definitively exists.
Using Leverage to Value Timeouts
We can also use our findings on leverage to put a reasonable bound on the value of the average timeout. For a rational coach to challenge a play, he must think that:
Leverage and probability of success can obviously change from challenge to challenge, and the value of a team’s last time out may be significantly different from the value of its first time out. However, given an average leverage and average success rate, coaches (if they are behaving rationally) value their timeouts, on average, at 0.03 WP at most.
This post has covered a lot of ground: establishing what leverage means in the context of challenges, what its distribution looks like (at least in the 2012 season), how leverage affects the probability of a successful challenge, and putting a reasonable upper bound on the value of a timeout. While we've made progress on our original questions, this work can both be improved and expanded upon. As more and more challenges happen, we’ll get a better and better idea of what the distribution of leverage actually looks like, which will improve all of this analysis. Logical expansions from here include examining which kinds of challenges tend to have the highest (or lowest) leverage, whether certain teams or coaches are better at challenging high leverage plays, and any others you can think of. We also don’t know whether these are the plays coaches should be challenging, which is an interesting question for another time.