Fourth Downs in the New Overtime: First Possession

This may have been the most difficult, challenging analysis I've done. No joke. The new OT format is more complex than it seems. There are three distinct 'game states' in which a team can find itself:

1. The initial drive of the first possession (A TD wins, a turnover or punt triggers Sudden Death (SD), and a FG triggers State 2.)
2. The team down by 3 now has one possession to match the FG (triggering SD) or score a TD to win.
3. Sudden Death

The possibilities are illustrated in the event tree below, along with some back-of-the-napkin transition probabilities I made back when the new rules were first proposed. (State 1 is "1st Poss". State 2 is the branch under "2nd Poss" that follows a FG in the 1st Poss. Sudden death is self-explanatory and occurs after a no-score in the 1st Poss or after a FG is matched in the 2nd Poss.)


The problem is that State 2 has never existed before. It will be a decade before there are enough examples to make any sort of reliable, empirical WP estimates. In this state, a team has all four downs to get into FG range, and once there it can score a TD to win. Plus, there is no urgency in terms of the clock. It’s a very weird situation.

State 2 also affects State 1, because the WPs of State 2 must be considered when valuing the FG option in State 1. In the new OT format, a FG doesn’t win. It only triggers State 2.

One important note before we continue: This analysis ignores time and the associated possibility of a tie. This is necessary to make the computations manageable. Although this assumption is not ideal, it is realistic. Although ties are going to be more frequent under the new format, they will remain rare. Additionally, it is far from certain how teams will typically value a tie. (See Jacksonville's 4th and 10 attempt from the Houston 47 with 2:36 remaining in week 11.) Three outcome possibilities complicate things far more than you might think. It reminds me of the Three Body Problem in my space dynamics class back in college--trying to compute the orbits of two objects, say the Earth and an object in orbit, are no problem. But throw in the Moon, and the math gets impossibly ugly.

So how do we crack the situation where a team needs a FG to survive, a TD to win, and there is no time pressure? We can look at the end-game of 4th quarters in which a team is down by 3. We can throw out all games that ended in expiration, and only look at combinations of field position and time remaining that realistically maximize the offenses chances of winning. For example, when we look at first down situations at a team’s own 20, at various times remaining, we chose the time in which a team roughly has the best chance of winning. That will be the point where time presses on them the least and the opponent does not have enough time to counter the drive. If we do this for all the various field positions, we can get a realistic estimate of how often teams in that kind of situation get a FG or score a TD.

Next we apply those FG and TD rates to estimate how often a team down by 3 points in OT will either win or trigger SD.

The most difficult analysis is to evaluate 4th down decisions in State 1, the initial possession of OT. This is because it must consider all the future possible game states. Using the standard 4th down conversion probabilities, punt distances, and FG success rates, we can generate the expected WP for going for it, punting, and FG attempts at each yd line and each to go distance. That analysis produces the chart below. The black lines are for FG attempts and punts. The colored lines are for conversion attempts at the various to-go distances. The blue line is 1 yd to go, and so on. Where the conversion lines are above the punt or FG lines, a team should go for it. Where either the punt or FG line is higher, a team should chose that option. (Click on the chart to enlarge.)


You may have noticed a few oddities about the chart. I apologize for the jittery lines. That's due to the win probability estimates for SD, which I already had on hand. Unfortunately, I rounded those numbers to 2 decimal places some time ago. Note that the more likely SD is the more jittery the line. For example, 4th and 1 is very smooth and 4th and 15 is the most jittery. The second oddity is the sudden drop-off in WP for 4th down attempts at the opponent's 20 yd-line and 10-yd line. Those discontinuities are due to the increasing difficulty of converting 4th downs inside the red zone. Obviously, the true estimates are going to be smooth and continuous. But it was easy to correct for that in the final analysis, which follows below.

Boiling down these numbers produces a handy cheat sheet. On or below the line a team should go for it.


There are two very interesting results. The first is that going for it gets better as a team is backed up in its own territory. This is because a punt triggers SD, which is a very bad situation to put yourself in while handing the opponent relatively good field position. Because it’s so bad, going for it at what would seem to be suicidal distances to go makes sense.

The second interesting result is that long FGs should not normally be attempted on the first possession. If a team misses a long attempt, which is obviously increasingly likely as distances get longer, SD is triggered with the opponent in relatively good field position. Oddly, the numbers suggest punting on 4th and 8+ is the right decision all the way up to the opponent’s 28-yd line! Try explaining that one in the post-game press conference.

What's left to be done? We still need to crunch the numbers for state 2 and 3. For example, down 3 points in OT (State 2), when should a team go for a conversion (or TD) instead of a FG? In SD, when should offenses pass up the long FG attempt in favor of a 4th down coversion? Actually, the numbers are already curnched. I just need to get around to writing everything up.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

20 Responses to “Fourth Downs in the New Overtime: First Possession”

  1. Anonymous says:

    The second oddity is the sudden drop-off in WP for 4th down attempts at the opponent's 20 yd-line and 10-yd line. Those discontinuities are due to the increasing difficulty of converting 4th downs inside the red zone. Obviously, the true estimates are going to be smooth and continuous. But it was easy to correct for that in the final analysis, which follows below.

    I'm not sure that's what I see on the graph, can you please explain?

    Great post !

  2. Justin says:

    Well the drop-off you're seeing inside the six is due to the distance to gain being equal to the distance to goal (ex: there is no such thing as a 4th and 6 from the 4.)

    So the way I read that is you always go for it with goal-to-go from the 6 or closer. And you always go for it needing 6 yards for fewer out to around the 15 or so.

  3. David Avraamides says:

    Isn't one of the possible terminal events for the team with initial possession a safety and thus a loss? I know it is a small probability, but it would still be a distinct outcome in your event tree.

  4. MFLoGrasso says:

    Here's an analysis I would find interesting once you get a decent handle on the overall scenario. What would the WP analysis be for onside kicking to start OT? Based on the rules (and I just verified this on the NFL website):

    "A.R. 16.2 ONSIDE KICK
    On the opening kickoff of overtime, Team A legally recovers the ball at the A41.
    Ruling: A's ball, first-and-10 on A41. A kickoff is considered an opportunity to possess for the receiving team. Team B is considered to have had an opportunity to possess the ball."

    So, if the kicking team recovers, SD is triggered immediately. If not, the kicking team has surrendered about 25-35 yards of field position for the first possession, and the receiving team is still not immediately in FG range. The simple formula would be to onside kick if:
    P(recovery)*WP(recovery)+P(non-recovery)*WP(non-recovery) > WP(kick away)
    which simplifies to onside kick if:
    P(recovery) > (WP(kick away)-WP(non-recovery))/(WP(recovery)-WP(non-recovery))

    So, if kicking away to start OT gives the kicking team a 50% chance of winning, failing to recover an onside kick reduces this to 45%, but recovering the onside kick increases this to 65%. Then it would make sense to onside kick at the beginning of OT if you think you have a probability of recovering the kick higher than (0.50-0.45)/(0.65-0.45) = 0.05/0.20 = 0.25.

  5. Anonymous says:

    The safety falls into the turnover/punt > SD scenario (SD over at that moment) so it is accounted for...

  6. MarkP says:

    MFLoGrasso:

    Based on the event tree, WP(kick away) is actually only 0.444.

    Assuming that a failed onside gives the opposing team the ball at the 41, and that the Markov model still applies {P(FG) = 0.247; P(TD) = 0.342}, WP(non-recovery) = 0.25748.

    Based on the Markov model for recovering at your own 41 {P(FG) = .155, P(TD)=.234} WP(recovery) = 0.5446.

    Plugging that into your equation gives us:
    (0.444 - 0.25748) / (0.5446 - 0.25748) = 0.649

    Making a recovery rate of roughly 65% of onside kicks the break even point. Earlier analysis (http://www.advancednflstats.com/2009/09/onside-kicks.html) showed that onsides are successful only 26% of the time. They are successful 60% of the times when they are unexpected, but still not to the 65% threshold.

  7. MFLoGrasso says:

    MarkP:

    But the issue is that the numbers in the event tree are simply placeholders, if I understand correctly. Brian's point in this post is that he hasn't had enough data to accurately assess the situation.

    Also, I don't think there would ever be an obvious onside kick in OT. The kicking team would always line up to kick away, so there would be some surprise component if it goes onside. However, as time goes on, as certain teams show a tendency to onside kick, the receiving team (if coached well) will begin to put better hands guys as their front blockers, likely meaning they are not as capable blockers as the standard return unit. This should lead to poorer returns when the ball is kicked away, increasing the WP of kicking away (increasing the required recovery probability threshold) while also reducing the actual recovery probability.

    I see an eventuality where a well-coached team will look at the return team placed on the field to start OT and have someone on the field make the decision to kick away or go onside depending on the personnel.

  8. Keith Goldner says:

    One question I have is with the narrowing down of situations to where it maximizes the team's chance of winning as far as time remaining goes (for that opening drive).

    I think given the current conservative mindset of coaches, both at the end of regulation AND in overtime you may see teams kick field goals more -- and playing it safe for the field goal -- rather than making the complete effort to score a TD and win the game. Obviously this is counterintuitive, but falls in line with the idea of coaches playing not to lose rather than to win.

  9. MarkP says:

    MFLoGrasso:

    I agree that the estimates still aren't known, but I think that the current Markov states are a reasonable current stand in. If you run through the event tree assuming starting field position of own 22 for kicking away (and kicking away to start state 2) and using the Markov model (the only assumption I keep from the event tree is 60% chance of the team with the ball winning in state 3), you still get WP(kick away)= 0.433 and a break even of just over 60% recovery.

    When I first started running the numbers, I expected to lower the threshold for onside, but that isn't how they came out. I agree that we don't know the actual WP's, but I think that these estimates are likely closer to reality than your initial intuitions. I agree that an occasional onside kick might be worth it (especially in situations favoring high variance), but not at the frequency your initial estimate would have suggested.

  10. MFLoGrasso says:

    MarkP:

    I had no idea which numbers to use. I was just wanted to use numbers that indicated a slight drop-off from kicking away if the kick wasn't recovered but a larger increase if it was recovered (which I suspect will occur).

  11. Brian Burke says:

    I ignored the possibility of a safety.

  12. mike s says:

    Brian do you account for sampling bias in your work? Not just this post but in general? For example, it's entirely possible that the better offense are on the field more because they convert more so the 1st and 2nd down data is not representative of the average team but instead skewed slightly to over represent the best offensive teams and under represent the worst defensive teams which can't get their offense on the field often... It's possible that either the best offenses have more 3rd down situations because they are on the field more, or FEWER because they convert on 1st and 2nd down more often and have fewer 3rd downs. I was glad to see you used 3rd down data rather than 4th down data to represeent the decisions teams should make on 4th down as I felt that the teams like the Patriots would line up and attempt to draw the other team offsides, if they do they get a free play, if they don't they read the defensive matchups and only run the play if they have a favorable matchup, otherwise they take a time out of take the delay of game. then they punt or kick a fieldgoal. This ability to only hike the ball when the matchups are favorable is really only available on 4th down. I suppose though the teams that are down by a tone because they are bad on offense may be more desperate and could skew things the other way. Nevertheless, even on 3rd down there may be a sampling bias because the data doesn't randomly and evenly represent the same number of downs for each team per year.

    I suggest each year counting the team with the least amount of 1st downs, the least amount of 2nd and the least amount of 3rd downs and randomly selecting just enough data so that you have an even representation of each particular down for each team and excluding the remaining. I am hesitant on whether or not to also suggest that you do that with each down and distance or even groups of down and distance (3rd and 7+ as one group, 3rd and 4-6 as another and 3rd and 3 or less as another)... Although there still may be sampling bias from teams that end up in 3rd and long because they are bad making the 3rd and long conversion rates potentially unfairly low, if you reduce the sample size too much per year it could increase the variance even if you give it a more accurate representative sample. Perhaps instead just removing or reducing the "outliers" or data (teams) with a large amount of a certain range of down and distances.

  13. Anonymous says:

    wait... what? on 10 yardline with 5 to go you go for it? I would imagine the fieldgoal is a virtual certainty, and your chance of winning would have to be extremely high. What's the odds your opponent gets 3 conversions or higher AND then kicks a fieldgoal? If you go for it, maybe you have a 40% chance of converting and winning right there and of course, here is why the go for it decision is correct, if you go for it and fail, your opponent STILL has a very small chance of putting together 3 conversions or whatever he needs AND making a fieldgoal. And with the additional 10 yard field position if you fail, this time a fieldgoal wins it automatically. Even though after the failure, the fieldgoal beats you, most likely it will result in you getting the ball back and having a chance to do it again and that's IF you fail your 40% conversion or whatever. The fieldgoal is close to a sure thing but isn't completely. If your opponent gets a big play on you regardless of what field position he has, it's over anyways. If you can prevent that and prevent the unlikelihood of allowing multiple conversions bak to back and a fieldgoal, you probably win.

  14. MarkP says:

    MFLoGrasso:

    That's why it is an interesting question, and I wanted to run down the outcomes with various numbers. Also: I made a fairly large mistake the first time through. With the kickoff now at the 35, an onside recovery (or not) occurs at the 46 instead of the 41, which has surprisingly large implications here. The WP's are now WP(kick away) = 0.433; WP(non-recovery) = 0.333; WP(recovery) = 0.675.

    Plugging this into the equation now yields a break-even of 29% onside recovery, well within the success for unexpected onside attempts. It will be interesting to see how teams actually perform in overtime, and if any one starts attempting occasional onside kicks.

  15. Eric Moore says:

    What about an onside kick attempt at the onset of state #2? It would be pretty interesting to see a team try.

    Recovery by the kicking team guarantees a win, while a failure to recover gives (using MarkP's numbers above) P(FG)=0.247 and P(TD)=0.342, therefore P(no score)=0.411. After a second field goal, I use the assumption referred to above that the team with the ball at the start of state #3 has a 60% chance of winning.

    If R is the chance of recovering the onside kick this gives:
    WP = R*1 + (1-R)*0.411*1 + (1-R)*0.247*0.6

    For R=0.26, WP=0.67, and for R=0.6, WP=0.82.

    Obviously this needs to be compared to the base odds of winning at the start of state #2 given a regular kickoff, but I find it interesting that a team that tries an onside kick and fails still has a >50% chance of winning the game.

  16. Ken R. says:

    Outstanding. Great analysis and discussion. Thanks Brian.

  17. Anonymous says:

    Sometimes a team gets a 15-yard penalty (for unsportsmanlike conduct, for example) that gets assessed on the kickoff. How about an onside kick now, either at the start of OT or at the onset of state 2? I don't know enough math to crunch the numbers, but I think it would be interesting to see.

  18. kloverr says:

    I happen to have been looking into this question myself recently. Based on my own work and a few sanity checks, I think some of your win probability curves might be wrong. (If you want to see my own write-up, it is available here: http://www.reddit.com/r/NFLstatheads/comments/141xqy/original_research_4th_down_decisions_under_the/)

    1. I used data from regulation punting situations occurring in the 1st or 3rd quarters to figure out how likely punting teams at a given yard line were to score first. (I justify the use of regulation data to approximate sudden death data in the link I provided above.) Because a punt initiates sudden death, team who scores first = winner. The resulting curve is here: http://i.imgur.com/HY8os.png. My estimate of the win probability for the punting team is significantly lower than yours on the opponent's side of the field.

    2. I think there might be issues with you go-for-it curves, too. To take an example: your curves say that a first possession team with a 4th-and-1 on their own 10 has about a 50% chance of winning if they go for it. But we know this number is too high because the conversion rate is something like 70% and a failed attempt is almost a certain loss. For the math to work out, you would have to say that a first possession team with a first down at their own ~11 has a 70% chance of winning, which is way too high (my estimate is 40%).

    I think that the way you derived your curves might be the source of our disagreeing values. If I understand your methodology correctly, you always chose the value of time that maximized WP according to your standard regulation WP algorithm. Because your WP values are consistently higher than mine (and in some places I think obviously too high even without reference to my curves), I worry that you may be unjustified in deriving your values in that way.

  19. MP says:

    mike s. -- That's probably the most important comment I have seen on this site. As you suggest, it has ramifications for all of Burke's situational work. It's unfortunate he didn't reply.

  20. Brian Burke says:

    Sorry I didn't replay to Mike S' comment. I think that's an important consideration. With more time on my hands I could do things like that. Like he mentioned, the effect of sampling bias could cut either way. The bias would affect conversion rates as well as basic EP and WP estimates.

Leave a Reply