Last season you might recall a dust-up between Harvard evolutionary psychologist Steven Pinker and popular science author Malcolm Gladwell over whether teams really have any ability to predict which college QBs will pan out into good pros. You might be wondering what the heck a psychologist and a pop-science author have to do with NFL football.
In his book What the Dog Saw, Gladwell wrote about how hard it is for school administrators to discriminate the better teacher candidates from the lesser candidates. Gladwell used the NFL draft to illustrate how difficult it is for anyone to predict human performance, even in a sport where there is ample performance metrics and every step, throw, and catch is videotaped from 12 different angles. Gladwell was referring to what was reported by economists Dave Berri and Rob Simmons as a "very weak" correlation between draft order and per-play performance by QBs.
In an exchange of letters following Pinker's critical review of What the Dog Saw, Pinker took issue with Gladwell's claim that there was "no connection" between when a QB is taken in the draft and his per-play performance. Pinker wrote that this is "simply not the case."
As has been pointed previously, the problem with the weak correlation cited by Gladwell is that it excludes players who are not judged good enough by coaches during their development to warrant much if any playing time. At its core, the NFL draft is a process of selection, and we should expect selection bias will taint most attempts at analysis. Gladwell looked at the draft process and (correctly) said:
"Coaches and GMs turn out to be good decision-makers when it comes to drafting quarterbacks when you consider the fact that the quarterbacks who never played aren’t any good. And how do we know that the quarterbacks who never play aren’t any good? Because coaches and GMs are good decision-makers!”
But Gladwell's argument cuts both ways. The only way to see that coaches and GMs aren't any good at drafting QBs is to assume they're no good at choosing which QB on their roster to play in games!
In this post I'll attempt to settle the question of whether NFL scouts really have any ability to identify the better QBs. Do the QBs picked higher in the draft turn out to be better performers on a per-play basis? Is Pinker correct that they do, or is Gladwell correct that they do not?
A Deceptively Complicated Question
As I wrote in a recent post, it's a far more complicated question than it seems at first. Some QBs who had very few attempts can have very erratic per-play stats, which would distort the relationship between performance and draft order. We can use a minimum qualifying cutoff of attempts to eliminate the low-sample size problem, but doing so increases the selection bias. Plus, what do we do about the players who never played a snap?
Berri and Simmons' solution was to set aside the players who had fewer than 100 attempts and who never played, which is almost guaranteed to obscure any connection between draft order and performance. (Actually, Dave tells me he did not use a cutoff for one of his methods, which compared picks 1-10 with picks 11-50 and surprisingly showed that the 11-50 picks outperformed the 1-10 picks group. More on that below.)
My original solution was to replace the non-qualifying QBs' stats with those of the 5th percentile qualifying QB, which is quite low. I thought that these QBs were judged not good enough to play much if at all by coaches, but it would not be fair to assume they would be the worst of the worst. My choice of the 5th percentile was arbitrary but at the time seemed reasonable. Using this method, I found a very clear linear connection between overall draft order and per-play performance.
A few months ago, Dave Berri convinced me that the choice of the 5th percentile may be too low. I think he also accepts that his method is subject to a large selection problem. So the question becomes, what level of performance is the best estimate for non-qualifying QBs had they been given more opportunity?
An Improved Approach
One way to estimate the average expected level of performance from non-qualifiers is to temporarily set aside all the players who didn't play a single snap. Next, we can aggregate all the other non-qualifiers' performance into one collective level of performance. We can then assign this expected performance level to all non-qualifiers. We can also assign this level to all the QBs who never attempted a pass. It seems fair to say that these QBs could not be expected to exceed the performance level of the QBs who were given at least a few games worth of attempts. This does not yet account for any improvement that inexperienced QBs would undeniably make. But for now I'll wait to address that consideration.
Measuring Career Performance
To measure performance I used an era-adjusted career adjusted yards per attempt (AYPA). AYPA is yards gained per pass attempt with a 45-yard penalty per interception plus a 10-yard bonus for each TD pass. Average AYPA has increased steadily over recent years, so a small correction was made based on the mid-point of each QB's career. My sample included QBs drafted in the first seven rounds of each draft from 1980 through 2000. I stopped at 2000 to allow most QBs some time to establish a career performance level. 1980 was chosen to avoid the "dead pass" era of the 1970s, a time when passing stats were severely depressed. Before 1978, pass blocking and receiver contact rules gave defenses a strong upper hand in defending the pass. It was almost a different sport.
The weighted aggregate average AYPA for each pass in the sample was 5.71. In other words, this is the central tendency among all passes, not passers. The average AYPA for individual QBs in the sample was 4.46. The difference is because there are more bad QBs than good ones, but good QBs throw most of the passes. (This fact alone suggests coaches are good at identifying the better QBs on their teams.) The highest AYPA in the sample belongs to Peyton Manning at 7.00 AYPA.
Among qualified QBs, which I defined as players with at least 200 attempts (about half a season), the average was 5.17 AYPA. The aggregate non-qualifier average was 3.65 AYPA, which is based on over 3,100 passes. The standard deviation among qualifiers was 0.95 AYPA, so the non-qualifiers were, as a group, about 1.5 standard deviations below the average qualifier.
(As it turns out, my original guess two years ago of the 5th percentile was almost spot on. 3.65 AYPA actually corresponds to the 6th percentile!)
The Relationship between Draft Order and Performance
After replacing the non-qualifier QBs' stats with their aggregate stats, we can get an estimate, free of selection bias, of how draft order relates to per-play performance. Correlation coefficients can be deceiving, so I'll first just plot the data with a linear best fit line, and you can see for yourself. The first plot shows the relationship between AYPA and overall draft order. Notice how the number of non-qualifiers increases as draft order increases. For every slot higher in the draft a QB is taken, he could be expected to have 0.006 additional AYPA.
The next graph plots AYPA by QB position order (1st QB taken, 2nd QB taken, and so on). Again, there is a clear relationship. The higher the pick, the better the performance. For every QB taken there is an average expected difference of 0.11 AYPA.
And lastly, here is a third way to look at the relationship, by draft round. For every round deeper in the draft, there is a drop off of 0.19 AYPA.
Improvement of Non-Qualifying QBs
Many readers might not buy the 3.65 AYPA replacement for non-qualifying QBs. It's safe to say that these players would tend to improve over time if they weren't so bad in their initial outings. It's very difficult to say just how much they would improve, so let's attack the question from a different direction: How much would these non-qualifiers need to improve for there to be no connection between draft order and performance?
To find out, I gradually increased the replacement performance level from 3.65 AYPA until the linear best-fit line became horizontal, indicating no relationship. How much improvement did this require? 0.5 AYPA? 1.0 AYPA? Actually, these QBs would have needed to improve by a whopping 1.80 to an average of 5.45 AYPA! In other words, these non-qualifiers would have to improve by about two standard deviations--each.
That's the same as improving expected performance equivalent to 9 draft rounds (if there were 9). I can't prove that all these QBs couldn't improve that much given more practice and time, but it takes a stretch of the imagination to even ponder such an improvement, not just for one QB, but for dozens. And for every QB who perhaps doesn't improve, other guys would have to improve by even more. Below is what the relationship would have to look like. The "improved" non-qualifiers are the dense band between 5 and 6 AYPA.
Ok, I think I've made my point. To create a scenario where draft order is unconnected to performance, you'd need to believe that the non-qualifiers, who as a group actually performed severely worse than average, are actually slightly better than average passers.
A couple other caveats. Reader Jim Glass pointed out that if we're going to credit non-qualifying players with improvement, then we'd also have to credit longer-playing QBs for their decline due to age. Reader Alchemist pointed out that the utility of the very top passers may not be linear with respect to their performance. The very top passers may be worth much more than the second tier. In fact, now I think about it more, it's definitely the case. A QB who can improve his team's chances of converting a series by just a small amount could double his team's chance of scoring on any given drive. The rules and format of the sport dictate that the relationship between series conversion rate and probability of scoring is geometric (scroll down to 'A Simple Model' in the link).
I'm Lost. What Was the Question Again?
Still not convinced? Let's take a couple steps back and recall what we're really trying to find out: Do teams actually have any ability to tell the better QBs from the worse QBs? What teams really care about are the first few QBs taken. It's safe to say the top three QBs taken in a draft get ample opportunities at some point to play. There were only two 1st QBs, two 2nd QBs, and three 3rd QBs taken who did not qualify in my 21 year sample. Setting them aside, the 1st QB taken averages 5.45 AYPA and the 2nd QB taken averages 5.13 AYPA. The 3rd QB taken averages 4.84 AYPA, and from there it flattens out, likely due to selection effects. The difference between the 1st QB taken and 2nd QB taken equates to about 0.44 wins/yr (based on a linear regression of team stats and team win totals). With a 16-game schedule where the difference between perfectly average and playoff-bound can be 2-wins, half a win per year is considerable.
I think position order may be the best way to examine the abilities of teams to identify the better players. The problem with using overall pick number to compare QBs is that it doesn't do a good job of identifying just how good coaches, scouts, and GMs think the player will be. Overall pick number can have as much to do with team need as player ability. In 2008 Matt Ryan went #3 to Atlanta who had a big giant hole at QB. Joe Flacco, who was the consensus second best QB available, didn't go until 19, when the Ravens traded down then up to get him. But had the Ravens happened to have the #4 pick that year and the Ravens weren't able to trade down, Flacco might have been the fourth overall pick. Had the Ravens not traded up to #19, it's conceivable Flacco would have dropped into the late 20s. Overall pick number is only partially a reflection of coaches' estimates of player ability, and that's one reason we shouldn't expect a particularly strong proportional correlation in performance and overall pick number.
How Big a Correlation Should We Expect?
Although correlations can be tricky, I should note that after factoring in the aggregate replacement AYPA for non-qualifiers, the correlation coefficient between overall draft order and AYPA is 0.39. If we grant the non-qualifiers an entire standard deviation of notional improvement, the correlation is still 0.27. But is that good? Is it high or low? What does it say about the teams' ability to choose the better players?
Although there are some players so good they compel teams to take the player regardless of need, this isn't always the case. Some of the jockeying by teams trading picks on draft day has to do with overall talent level of the player they're targeting, but much of it is to get just in front of another team with the same need. As in the Flacco example above, there is an important unaccounted-for factor in where a player is picked overall--team need. That's going to add some degree of variance in a QB's draft order that will not correlate with player ability. This factor reduces the correlation coefficient between overall draft order and ability.
Overall draft order and career performance are both functions of many factors. A simple model might go like this:
Overall draft order = F(scout estimate of player ability, team need, importance of position, availability of great players at other positions, salary considerations)
Career performance = F(true player ability, offensive line, receivers, scheme, injury, sample error)
So the variance of those variables would reflect the same factors:
var(overall draft order) = var(scout estimates of ability) + var(team need) + ...
var(career performance) = var(true ability) + var(team factors) + ....+ interaction effects of all the above
If scouts really were pretty good, how much of all that variance can we really expect to be accounted for by the overlap of true ability and scouts' estimates of ability, considering all the other factors that intervene? That's what the correlation coefficient is trying to calculate, and it would be amazing if it were any higher than what we see here, about 0.27 or so. In the end, what scouts are doing is trying to predict human performance years into the future, performance that will be shaped by many more things than just player ability.
When we look at position order instead of overall order, we don't have to worry about much of what determines overall position: team need, importance of position, and availability of other great players. Of the top 3 QBs taken, which is what really matters to most teams, the 1st guy usually does better than the 2nd, and the 2nd usually outperforms the 3rd. I think if we're interested in scouts' ability to discriminate, position order is very informative because it's independent of team need.
Lastly, a few points of preemptive rebuttal. Dave Berri reports that in his data set, QBs picked 1-10 overall are outperformed by those picked 11-50 overall. (Without applying a qualifying cutoff.) My data (actually Pro-Football-Reference.com's data) shows the opposite. Between 1980 and 2000, picks 1-10 average 4.95 AYPA, and picks 11-50 average 3.71 AYPA. But that includes Mike Jenkins, who threw two passes, one complete for 5 yards and one interception for a -20 career AYPA! Matt Blunden had -8 AYPA on 9 attempts. Excluding those two unfortunate guys, the 11-50 average is 4.55 AYPA, better but still considerably lower than the 1-10 guys.
I'd also question the selection of endpoints at the 10th pick and at the 50th pick. Is there some meaningful practical difference between players on either sides of those fences? It may be that random fortune in Dave's data, which includes the 1970s, make the results appear to give the edge to one group or the other. In other words, if we move the cutoffs from the 10th pick to say the 12th pick, and the 50th pick to say the 45th pick, would Dave get completely different results?
Berry and Simmons used a larger data set, but not necessarily a better one. As mentioned above, including the 1970s "dead pass" era may confuse the results of their analysis.
I should also note that Berry and Simmons' measure of performance is different. They use a 'wins created' metric based on a linear regression of team stats. The only practical difference between our respective measures of performance is that they include running data, which I do not include. Could this make a difference? It might, but running is important to only a small fraction of QBs, and many of the recent running QBs, such as Vick, Culpepper and McNair, were top picks. So even if running made much of a difference, it's not clear which side of the analysis it would favor.
In the end, I'm convinced that once the selection bias is accounted for, there is an unmistakable relationship between draft order and career per-play performance, a relationship we shouldn't expect to be that large anyway.
Addendum: Dave Berri responds here.