Bill Parcells famously said "You are what your record says you are." Although that's undeniably true with regard to how the NFL selects playoff teams, and I wholeheartedly believe a leader needs to think that way, Parcells is only 58% correct. That's not a joke. It's 58%, and here's how something like that can be measured.
One staple of statistical analysis in sports is estimating how much of a given process is the result of skill and how much is the result of randomness or 'luck'. By luck, I’m not referring to leprechauns, fate, or anything superstitious. Real randomness is far more boring. Imagine flipping a perfectly fair coin 10 times. It would actually be uncommon for the coin to come out 5 heads and 5 tails. (In fact, it would only happen 24% of the time). But if you flipped the coin an infinite number of times, the rate of heads would be certain to approach 50%. The difference between what we actually observe over the short-run and what we would observe over an infinite number of trials is known as sample error. No matter how many times you actually flip the coin, it’s only a sample of the infinitely possible times the coin could be flipped.
As a prime example, the NFL's short 16-game regular season schedule produces a great deal of sample error. To figure out how much randomness is involved in any one season, we can calculate the variance in team winning percentage that we would expect from a random binomial process, like coin flips. Then we can calculate the variance from the team records we actually observe. The difference is the variance due to true team ability.
The wider the observed distribution is compared to a potentially purely random distribution, the more true ability matters. And variance is what measures the width of a distribution. Hypothetically, think of a 16-game round-robin football league where the better team always won, eliminating all sample error. There would be 1 team with 16 wins, 1 with 15 wins, and so on. This would be a very wide and flat distribution.
Let’s compare the actual variance of team winning percentage with the purely random variance. The observed variance from the ten most recent NFL seasons (which is the square of the standard deviation) is 0.194^2. The binomial variance over 16 trials (games) is always 0.125^2.
We can calculate that the ‘true’ variance of an NFL season is 0.148^2. Further, we can say that the proportion of luck in the outcomes we observe over a full season is :
...which is 0.125^2/0.194^2 = 42%.
Put simply, 42% of an NFL team’s regular season record can be accounted for by randomness, otherwise known as sample error. The short 16-game season is too small of a sample to provide much confidence that team records accurately reflect their ‘true’ level of ability. The more games in a season, the smaller the sample error, and the more certain we could be that the teams with the better records are truly the better teams. (Please note I am not advocating a longer season. The purpose of the NFL is not a scientific experiment to clinically determine the best team.)
Another way to look at the relationship between true ability and randomness is in terms of r-squared, which is the proportion of variance, and r, which is the correlation coefficient. The r-squared of randomness is 0.42, and the r-squared of true ability is 0.58. That makes the correlation coefficient (r) between observed team records and a team’s true ability the square root of 0.58, which is 0.75. This means that after a full season of 16 games, your best guess of a team's true team strength should regress its actual record one quarter of the way back to the league-wide mean of .500. (I found very similar results using a more clumsy method a few years ago.)
So although the answer may not be precisely 58%, the effect of sample error helps explain why we see apparently ‘bad’ teams like this season’s Buccaneers have winning records and apparently ‘good’ teams like the Chargers have losing records. It’s also why the NFL is notoriously hard to predict, and why regression to the mean is so strong from year to year. The more random the process, the stronger regression will be.
The framework of var(observed) = var(true) + var(random) applies to more than just season records. It applies to any process, whether in sports or in real life. In a subsequent article, this method will be the basis of an examination of what proportion of a QB’s interceptions can be attributed to sample error.