As I noted in my game commentary, if you need to call a timeout to think over your options, the situation is probably not far from the point of indifference where the options are nearly equal in value. And timeouts have significant value, particularly in situations like this example--late in the game and trailing by less than a TD--because you'll very likely need to stop the clock in the end-game, either to get the ball back or during a final offensive drive. Would Carroll have been better off making a quick but sub-optimum choice, rather than make the optimum choice but by burning a timeout along the way?
Here's another common situation. A team trails by one score in the third quarter. It's 3rd and 1 near midfield and the play clock is near zero. Instead of taking the delay of game penalty and facing a 3rd and 6, the head coach or QB calls a timeout. Was that the best choice, or would the team be better off facing 3rd and 6 but keeping all of its timeouts?
Both questions hinge on the value of a timeout, which has been something of a white whale of mine for a while. Knowing the value of a timeout would help coaches make better game management decisions, including clock management and replay challenges.
In this article, I'll estimate the value of a timeout by looking at how often teams win based on how many timeouts they have remaining. It's an exceptionally complex problem, so I'll simplify things by looking at a cross section of game situations--3rd quarter, one-score lead, first down at near midfield. First, I'll walk through a relatively crude but common-sense analysis, then I'll report the results of a more sophisticated method and see how both approaches compare.
I essentially want to know the value of a timeout in terms of Win Probability (WP), which is the universal linear utility function of the sport. If we can measure the value of something in terms of WP, we can apply that knowledge in all kinds of ways.
To estimate a timeout's value, we just want a good estimate of how often a team wins in some situation with n timeouts compared to how often a team wins in the same situation with n-1 timeouts. The difference in how often a team can be expected to win in each case is our result--the WP value of the timeout.
In football, where a season consists of only 16 games, there isn't as much data as we'd like to make reliable WP estimates just by counting how often a team wins for a given game state based on score, time, down, distance, field position. And splitting the data up further according to timeouts remaining thins the data out far beyond the point of statistical reliability. So we need to aggregate, remove the noise, and then de-aggregate the data for reliable results.
We need to make a few assumptions to help analyze things. First, like the guy in the AT&T commercial might ask, Is having more timeouts better or having less timeouts better? More! It's obvious, but when we might observe higher win rates associated with fewer timeouts, we can be confident there is significant noise. Also, timeouts are essentially equivalent. The difference in value between having 3 timeouts vs having 2 timeouts is essentially the same as the difference between having 2 vs 1, and so on... We know this probably isn't true, but it will simplify the problem greatly. There are good reasons to think that the difference between 1 timeout vs 0 timeouts is larger than the other differences.
To see the WP effect of a timeout, I started with common situations that we would be most interested in. I also tried to isolate as many variables as reasonable. I aggregated all situations where the offense was up by 0 through 7 points and had a first down between their own 40 and the opponent's 40 in the 3rd quarter.
Normally, I would not aggregate situations this way for many reasons. But in this analysis, I am less concerned with the actual WP estimate as I am the difference in WP estimates in like situations based on timeouts remaining. Also, as you'll see later, we can account for bias created by the aggregation.
There are 4 timeout states for each team (0...3 timeouts remaining), which makes 16 potential combinations of timeout states. Here is a matrix of how often we see each combination of timeouts remaining for the offense and defense in the selected cross-section of data. The columns represent the offense's timeouts left, and the rows represent the defense's timeouts left. For example, there were 290 actual situations where the offense had all 3 timeouts and the defense had 2 timeouts.
|Def \ Off||0||1||2||3|
There were very few cases of 1 or fewer timeouts for either team, so for this primary analysis we'll rely on the difference in value between 3 and 2 timeouts. In fact, I'll delete the 0 row and column from here out.
Here are the raw average win rates for the offense (as opposed to the smoothed, modeled WP estimate) for each combination of timeout states. Like we would expect, having more 2nd-half timeouts usually results in winning more often in a close game. The offense tends to win more often as the defense's number of timeouts decrease. For example, after the defense has used its first timeout, offenses go from winning 69% of the time to winning 73% of the time.
Raw Win Rates
|Def \ Off||1||2||3|
Likewise, offenses win less often as they use their timeouts. Going from 3 to 2 timeouts reduces its raw win rate from 69% to 63%. Right off the bat, we can make a very rough approximation of the value of a timeout in similar situations is near 3..4..5% or so. I was honestly surprised at this result. I was expecting something smaller.
But we aggregated a lot of qualitatively different situations--large swaths of score differences, field position, and the entire 3rd quarter! Frankly, I'm not worried about the score difference problem and field position problem. Not that they don't affect WP--It's that they don't correlate with TOs remaining (r=.017 for yd line and .005 for score difference), so they're not going to introduce bias into the analysis of average differences. It's true that there are more snaps near an offense's own 40 than its opponent's 40, so the midpoint of the swath of data won't be exactly midfield. But that's ok, at least for now.
What I am concerned about is bias caused by game time. It's not too difficult to understand that there tends to be fewer timeouts remaining as the half progresses (r=.302). Teams never gain timeouts. And as time goes on, the same lead in terms of points will bring an improved chance of winning. For example, a 3 point lead has a higher WP at the end of the 3rd quarter than it does at the beginning of the 3rd quarter. So there will be a slight illusion that makes it appear that having fewer timeouts, either for the offense or defense, means a higher WP for the offense.
Here is a similar matrix of timeouts remaining for defenses and offenses in the 3rd quarter that lists the average minutes remaining in the game. For example, there is an average of about 23 min left in the game (8 min in the quarter) when both teams have all 3 timeouts, and an average of about 20 min left when either the offense or defense has 2 timeouts left.
Average Game Time
|Def \ Off||1||2||3|
To account for the bias caused by the relationships between time, timeouts remaining, and WP, we can calculate the difference between the raw win rates and the average expected WPs for our subsample. The current WP does not (directly) consider timeouts. In effect, this is a crude multivariate regression. But instead of dumping a bunch of variables into regression software and pressing the 'I-believe-in-analytics' button, we can transparently see if what's going on makes sense.
The matrix below shows the average expected WP for each timeout state combination. For example, when both teams have all 3 timeouts, the average time remaining was 23.4 minutes which equates to a .70 expected WP. As you can see, the WPs are relatively flat, meaning that they don't climb very much for a given lead through the 3rd quarter, and even its slight increase is fairly linear. The big divergence begins shortly into the 4th quarter. (That's one reason I chose the 3rd quarter to begin attacking the timeout value problem.)
Expected Win Probability
|Def \ Off||1||2||3|
We finally arrive at the results. The final matrix below lists the raw win rate above expected for each timeout combination. Keep in mind the very small sample sizes in everything other than the 3-3, 2-3, and 3-2 cells.
Win Rate above Expected
|Def \ Off||1||2||3|
If we focus only on the differences between the 3-3 and 3-2 cells, we can get a rough approximation for the value of a 2nd-half timeout in the 3rd quarter when the offense is ahead by one score:
-When both teams have all 3 timeouts, the offense ends up winning 2% less often than expected.
-When the offense has all 3 timeouts but the defense has only 2, the offense wins 2% more often than expected.
-When the defense has all 3 timeouts but the offense has only 2, the offense wins 8% less often than expected.
When the defense uses a timeout, it costs:
wp(2,3) - wp(3,3) = .02 - (-.02) = .04
And when the offense uses a timeout, it costs:
wp(3,2) - wp(3,3) = -.02 - (-.08) = .06
Given the assumptions, a fair rough estimate would be that the value of the first timeout is .05 WP.
Back to the NFC Championship Game situation. Down by 4 on the SF 37 with 14 minutes to play, SEA faced a 4th and 7. Going for it yields a .35 WP and punting yields a .32 WP, according to the general (timeout unaware) model. But the Seahawks chose neither option. They chose the call-a-timeout-then-go-for it-option instead, which was worth .35 WP - .05 WP = .30 WP--worse than had they punted right away.
(Of course, this might be the worst example in the world, as they went on to score the go-ahead TD on that very next play.)
This was a very crude first approximation. In the second part of this article, I'll conduct a more sophisticated analysis using a regression model that will account for the full mix of variables including field position, score, and time that were aggregated in this first approximation. If the results of the more advanced method confirm the common sense estimate, we can have confidence in its approach, and ultimately produce a generalizable method for estimating the value of a timeout across all values of score, time and field position.