Response to Luck and Belichick Article Comments

The traffic to this site has increased quite a bit lately. A lot of it is likely due to the interest in the playoffs, but much of it is from direct links from other sites. Most direct links are to my articles about Rating 'Gameday' Coaches and about Belichick Cheating Evidence. They have appeared on many message boards across the football world, and many of the comments and criticisms are outstanding. Allow me to address some of the comments here. The foundation of both articles deals with luck. Because there are so many new readers here, I'd like to clarify what I mean by luck, and how I use this definition when I apply statistics to make observations about coaching.


To me, a good example of luck is the "bunching" of successful events. In football, first downs are nice, but consecutive first downs are what allow touchdown drives. The number of first downs and the yardage gained in each should be ascribed to skill. Those are things the teams on the field control. But whether those first downs come in bunches or are interspersed is something different.

Perhaps a baseball analogy illustrates my point best. One single per inning gets a team zero runs after nine innings. But nine singles in one inning, followed by zero hits in eight innings would usually yield about six runs. In football, think of a drive as an inning--a team usually needs consecutive successes to score. If players could control when successful plays occurred, sports would be very different. Batters would save their hits for when runners are on base, or when the game is on the line. Receivers would save their dropped passes for the 4th quarter of blowouts.

Here is a football example I've used before: Let's say both PIT and CLE each get 12 1st downs in a game against each other. PIT's 1st downs come as 6 separate bunches of 2 consecutive 1st downs followed by a punt. CLE's 1st downs come as 2 bunches of 6 consecutive 1st downs resulting in 2 TDs. CLE's remaining drives are all 3-and-outs followed by a solid punt. Each team performed equally well-same yards, first downs, turnovers, kicking etc. But the random "bunching" of successful events gave CLE a 14-0 shutout.

That's what I mean by luck. There are several other factors that could be considered random, but my theory is that the bunching effect could explain the bulk of the observed differences in in-game performance and ultimate outcomes.

Luck and the Model of Team Wins

So when I rank teams by luck, I am using my prediction model to estimate how many games a team would be expected to win given their on-field performance. If they win more than expected, I say they're lucky. If they win less, they're unlucky.

Admittedly, the model can't possibly account for every possible consideration on the field of play. There are too many moving parts and inter-dependencies in football to just say, "whatever I don't account for must be luck." There are other factors, such as weather or coaching tactics, some of which are unmeasurable.

But we do know how accurate the model is. We know it accounts for 80% of the variance in team win totals. And there are sound techniques showing that luck, or randomness, accounts for a very large part of the 20% that's left over. That's why I call the model's residual (the difference between estimated and actual wins) luck. Or least the bulk of it is.

Coaching Tactics

Coaching tactics on gameday, such as clock management and whether to kick or go for a first down, are one of the things the model does not account for. It is a small part of the residual. When I ranked coaches on their gameday tactics I used the residual of the model. But a critic would rightfully point out the obvious contradiction--How can the residual be considered luck in one case and 'coaching tactics' in the other?

The answer is that luck is random by definition. It does not correlate with anything. So if you average out enough years of performance, the luck part of the residual tends to cancel itself out, and what's left over is non-random considerations, including coaching tactics. The more years you have in the data, the more likely it is that the luck cancels out. When there is only one year of data, the residual will still contain the luck. This isn't my own personal theory, but one of the central tenets of inferential statistics.

Further, when you have enough years of data and divide up and analyze the data by coaches, and not by something else, you get a good estimate of that coach's gameday contribution to his teams' win totals. Essentially, I'm saying other coaches, given the same on-field capability of their players, would win X many games. Coach so-and-so won on average X+1.2 games per year, so he is credited with a +1.2 "wins added wins per year" score.

A coach that takes a fantastically talented football team to a 10-6 record would not score high. But a coach that can consistently take an average team to a 10-6 record would be ranked at the top.

By the way, I call it 'gameday,' because the preparation and practice part of the coaching job would be reflected in the on-field performance stats and not in the residual. It's the 4th down decisions and such that aren't captured in the efficiency data I use.

Belichick and Cheating

After ranking all the coaches, I had expected to see Belichick at or near the top of the list. He was actually near the middle. So I split his ratings for his tenures at Cleveland and New England. His 'wins added' score was literally off the charts. It was 3 standard deviations beyond any other coach, and he never had any single year that wasn't off-the chart itself. That would make him not only a once-in-a-lifetime type of tactician, but a once-in-a-millennium super-genius.

At first I thought, wow, he really is something special. But then the cheating revelations hit, and I thought this could be due to more than just genius. In fact, it makes a lot of sense given that we already know he is willing to break rules for a competitive edge. There were many other reports of cheating by the Patriots, beyond taping signals, such as exploiting QB helmet radio communications in various ways.

I'm not saying the Patriots aren't a great team or even that Belichick isn't a great coach. They obviously are. But both things can be true. They can be both great and cheating.

A solid criticism of my approach would be that I can't just chalk up the one team that breaks my model to cheating. I'm not. I was scratching my head wondering why this one team defies the statistical tendencies of the 31 other teams. Then several weeks later, it was revealed that that same team had been cheating.

By no means do I claim that my analysis is iron-tight. I think it's useful and interesting. Feel free to disagree.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

10 Responses to “Response to Luck and Belichick Article Comments”

  1. Mike Molloy says:

    Few comments. First, this is a terrific site, and this issue in particular is really fascinating; thanks for the effort on this.

    1. I agree that the observation about the Pats' consistently outperforming their win expectancy is interesting and supports the cheating hypothesis. (It also supports various other hypotheses, e.g. Belichick is an exceptionally good coach, etc. For what it's worth, I'm a Pats fan.)

    2. In the "Rating Gameday Coaches" post, when you break down Joe Gibbs' record into his two stints with the Redskins, he comes out with 2.11 wins added above expectation in his first stint. In view of this, Belichick's 2.33 in the current stint with the Pats looks less out of whack. If, as you say, the 2.33 is a once-in-a-millenium number, it looks like there have been two of those in the last twenty years. (Though they have been in different millenia...) Unless 2.11 only counts as once in 500 years or something. I do wonder if your remark, that Belichick is 3 standard deviations above the mean, is based on the broken out version of Gibbs, or the unified Gibbs. Also, do you have a conjecture about the difference between the two Gibbs stints? Did he cheat in the '80's, then lose his cheating mojo for his current stint? (No offense, Skins fans; if there's evidence in the one case, there's evidence in both cases.)

    3. In the graph on the "Rating Gameday Coaches" post, 2000 and 2001, Belichick's first two years in NE, are omitted, but I would suppose those years are included in claims like the 3-standard-deviations one; could you confirm that please? (It does look like the graph would support a number less than 2.33, though it's a little hard to tell where exactly a couple of the plotted points lie.)

  2. Brian Burke says:

    Mike-Unfortunately, the available data before 2002 is not as robust as post 2002, thus the discontinuity. For example, the data identifies interceptions and fumbles, instead of simply 'net turnovers.'

    Regarding your points.
    2. The 3 SD observation regards data from 2002-2006, which is more detailed. The Gibbs 2.11 number, and the Belichick 2.33 numbers are based on the pre-2002 standards. Even though I had more data for 2002-2006, I stuck with the same model to compare apples to apples.

    3. So no, I would confirm the opposite. The 3 SD claim regards 2002-2006 data, and not the first 2 years of his NE stint. I wish I had some more turnover data.

    Gibbs was obviously cheating. He has no character. Just kidding--he's actually a neighbor of mine. I think the difference between his two stints here in DC is due to two things. First, he was blessed with a talent advantage in the 80s/early 90s. Second (and this is what pertains to my research), in the 80s he was playing against teams who had not yet adapted to the new passing rules (o-line blocking/ interference/ etc.). So his old-school style of "run out the clock with Riggo as soon as you get a lead in the first half" didn't work against the modern Eagles and Cowboys.

    But that's a good point. The Gibbs 2.11 number in his first stint is, as you said, a once-in-500 yrs number or so. But there's an important difference between Gibbs and Belichick.

    Belichick was caught cheating. It's only a question of the extent. Gibbs has never been suspected of anything like that.

    The Gibbs 2.11 number might be a classic case of a Type II error. We see significance when there really is none. Unless we have a prior reason to suspect foul play, we can't make the same inference about Gibbs as we can about Belichick.

    Good luck on Saturday.

  3. mathphysto says:

    Doesn't the Pats 2007 data showing them yet again outperforming your models' expectations contradict the cheating hypothesis? The 2.1 is very much in line with data from 2003 on.

    Also, I find it hard to believe that Gibb's 2.11 with the 80s/90s Skins was due to (a) talent advantage and (b) new rule exploitation. If it's talent, why don't you see similar results for Seifert or Noll, for example? And you could obviously argue that Gibbs really didn't have much of a talent advantage (if any) compared to other NFC teams of the era on a year-in, year-out basis. Also, shouldn't a talent advantage be captured in their Expectancy? Do you mean to suggest a 'clutch talent' advantage? If so, good luck hunting that unicorn.

    Rule exploitation is a really lame hypothesis - the rules (and their enforcement) are always changing, so you're basically saying that of all the rule changes that have occurred, that's the only one where one coach (Gibbs) effectively exploited it while no one else did. I don't buy it.

    Hell, if you go that route you could say Belichick exploited the lack of enforcement of illegal contact (until 2004) and thereafter had a significant clutch talent advantage, particularly over divisional rivals.

    Side note: I'd like to see p-values calculated to account for sample size differences between coaches. It could be that Dungy's 1.73 over a larger sample size would be less statistically likely than Belichick's 2.33 with the Pats.

  4. Brian Burke says:

    1. Yes, NE has exceeded their expected wins again.

    2. Gibbs coached in two different eras. Changes to the passing rules have altered the balance between run and pass. Gibbs had a long discontinuity between his two stints. He was out of coaching completely for years. Belichick had a very short period between his two stints as a head coach, and his took place in roughly the same era. I'm not sure why you find that 'lame.'

    3. I agree Gibbs' number from the 80s is quite remarkable. Perhaps he was good and lucky. Belichick might be good and lucky and play games with QB helmet frequencies.

    4. Dungy has had some overperforming years mixed with underperforming years. It's the fact that Belichick has never had an underperforming year, i.e. it's the several consecutive overperformances by significant amounts that puts him in the 3SD range. Dungy's pattern isn't like that at all.

  5. mathphysto says:

    Re 1: So doesn't this indicate that their cheating had negligible effect? Therefore the 2.33 is almost entirely (if not completely) due to Belichick (according to your interpretation of the statistic). You could calculate how much is likely due to cheating by treating years 2003-2006 as control group data, 2007 as experiment group data.

    Re 2: What I find 'lame' is the probability that Gibb's 2.11 was due to rule changes as you had suggested earlier. I wasn't speaking about the changes in their ratings between their stints, as you seemed to think.

    Re 3: See Re 1 above.

    Re 4: True, but you do need to quantitatively account for the sample size differences to say how much of a difference there really is between Dungy and Belichick.

    I really do think that your interpretation of this statistic is at least somewhat flawed, though. Anything that has Dan Reeves, Mike Martz, and Dave Wannstedt (!) above Bill Walsh has to be wrong.

  6. Vern says:

    While the difference from expected wins is significant, the cheating theory is probably the most fantastical of explanations, and also, a somewhat derivative one.

    That is, if you think "knowing what the other team will do" is significant, that suggests that the strategy PERIOD is significant to the result -- even if you know from some other legal means (tendency analysis of the other coach, etc.).

    The point isn't that the "knowing" is significant, but that changes in strategy that don't appear to be extreme can have such a dramatic effect on the results.

    For example, one of the other qualities that is well known about the Patriots is that they change their game strategy more than any other team from game to game and half to half, or even situation to situation. They do this in "subtle" ways as well, such as going all spread on offense all game long and rarely running it, or playing dime coverage all game, then switching to something totally different for the next opponent. They are not resorting to the kinds of extremes in strategy that many stats try to account for - such as never punting or kneel downs to eat the clock in end game situations.

    The real question this exposes is how much these more "subtle" variations in game strategy can impact the outcome of the game.

    In short, using your analogy on luck, it would appear that in Football, you CAN control how you bunch your performances (at least from game to game, bunching all your passes against a weak passing opponent, etc.)

    It's just this subtle adjustment of strategy that makes it seem like a good poker player is cheating - "bunching" his best hands with his highest bets.

  7. David says:

    Hm. Belichick's Patriots are second in the league this year, after going 2.1 over their expectation last year. Care to revisit this article's allegations in light of two seasons post-scandal whose results show a consistency with the team pre-scandal? I would think this would be worth your time.

  8. Piper says:


    I will, he's still cheating.

  9. Anonymous says:

    Stealing signals isn't cheating. Filming them with a camera from the sideline is cheating. You seriously think the Patriots success can be explained by this fact? You are naive. Stick to number crunching.

  10. Brian Burke says:

    On the contrary, I think it's far more naive to believe whatever it was they were doing gave them no advantage.

Leave a Reply