I've previously commented that using 3rd down percentage in an analysis of team strength or a game prediction model is not a good practice. I realize this is counter intuitive. 3rd down percentage is highly correlated with winning, and unlike total rushing yards, the direction of causation is clear. Converting the always-critical 3rd down leads to winning. So why wouldn't it be a good stat?
"When we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail."
Bill Parcells once barked, "You are what your record says you are." I prefer the saying that "you're only as good as your next game." A team's record may be what matters when deciding who goes to the playoffs, but a team can't really change its record--except by winning or losing its next game. So when we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail. Many of those details are unique to the circumstances of the past. Models like this are known as "over-fit."
Stats such as 3rd down percentage tell us more about what has happened to a team in the past than how well it will do in the future. In a recent article, I tested how well various stats endure through the season. If a team stat from the first half of the season does well predicting itself in the second half of the season, we have a good idea that it is an enduring and repeatable skill, and not primarily the result of randomness and non-repeating circumstances. The table below lists how well each team stat correlates with itself between the first and second half of a season.
|O 3D Rate||0.43|
|D Int Rate||0.08|
|D Sack Rate||0.24|
|O Fumble Rate||0.48|
|O Int Rate||0.27|
|O Sack Rate||0.26|
Offensive 3rd down rate endures fairly well within a season, with a correlation coefficient of 0.43. But what if I could predict a team's 3rd down percentage with a completely different stat better than past 3rd down percentage itself? What does that tell us about 3rd down percentage as a stat?
The table below lists other offensive efficiency stats as predictors of 3rd down percentage. In other words, these are the correlations between a team's other stats from the first half of a season and the team's 3rd down percentage from the second half of the same season.
|O 3D Pct||0.43|
|O Sack Rate||-0.53|
|O Int Rate||-0.42|
We can actually predict a team's 3rd down percentage better with offensive pass efficiency, or with sack rate, better than with a team's to-date 3rd down percentage. And with the correlation with run efficiency at a very small 0.08, we see that the passing game has almost everything to do with 3rd down conversions. (Teams tend to pass on anything longer than 3rd and 1 these days.)
So why include 3rd down percentage in a rating of team strength or a win prediction model when passing stats are already included? It would only serve to add random noise. Instead of telling us how good a team is or will be, it would tell us more about the unique circumstances and random luck the team experienced in the past.
If we still want to use 3rd down percentage as a stat to predict how good a team will be, we can. After all, 3rd down success is critical in sustaining drives and scoring points. It correlates with team wins at about 0.49 and with points scored at 0.65, both relatively very high. Instead of actually using to-date 3rd down percentage, we should estimate what the 3rd down percentage will be based on the stats we know to be predictive.
The table below is a regression model using passing stats to estimate future 3rd down percentage.
|O Sack Rate||-11.6||0.00|
|O Pass Efficiency||1.29||0.01|
|O Int Rate||-1.53||0.00|
The actual model coefficients aren't as important as the fact that the r-squared is 0.94. That means that we can predict a teams's future 3rd down percentage with almost crystal ball-like accuracy using passing efficiency stats. And ironically, if we add previous 3rd down percentage itself to the model, it is the only non-significant variable (p=0.13) and r-squared is (strangely) reduced.
An r-squared of 0.94 is the equivalent of a correlation coefficient (r) of 0.97. Remember, this compares to the self-correlation of previous 3rd down percentage of only 0.46.
So if we want to know a team's ability to covert 3rd downs, we're far better off looking at passing stats than previous 3rd down conversion rates. And a prediction model is far better off using those passing stats (pass efficiency, interception rate, sack rate) and excluding to-date 3rd down percentage.