"Analytics" has unfortunately become a trendy buzzword in sports. I've found that many people who are only vaguely familiar with analytics, including some team executives, media members, and fans, have the wrong idea about what analytics is. Some think it's a panacea that can optimize the solution to any problem. Some think it's just statistical trivia or scientific minutia, like ESPN's Sports Science series (Dwight Howard's arm-span is as big as a 2-car garage!). Others think it's just Moneyball, a one-time talent arbitrage applicable to only one sport. So I thought I'd put my own thoughts down on what analytics is, at least as it applies to football.
I'm not the gatekeeper on what qualifies as analytics, and I'm not going to say what counts and what doesn't. But I think analytics comprises all or parts of four general processes:
1. Asking meaningful questions
2. Getting relevant data
3. Analyzing the data quantitatively
4. Presenting the results
Sports analytics is really no different than any other statistical field of inquiry. These 4 basic processes are basically just the classic scientific method. Like many other fields, we can't really create lab experiments. If we want to know if calling a timeout to prevent a 3rd-quarter delay of game is worth it, we can't replay the same game 1,000 times with the timeout and 1,000 without the timeout, and compare which choice leads more often to winning. In sports, the 'natural' experiments have already been done for us, each one as a unique trial.
The bottom line of analytics is that, if done well, it can provide concrete insight to make better decisions. Just like the Scientific Revolution we learned about in grade school, sports analytics is replacing what used to be often-mistaken intuitive judgments and conventional dogma with rigorous analysis.
Analytics is different than traditional statistical analysis in two ways. First, the digital age has made large amounts of data widely available. "Big Data" can fundamentally alter the scientific method. With a large enough set of data, a traditional hypothesis isn't always required, at least for many purposes. It's also enabled the spread of novel ideas and methods far beyond what a book or magazine article could do. Second, computational technology has progressed to the point where relatively advanced machine learning algorithms are widely available and are nearly costless to apply. (Just compare Virgil Carter feeding punch-cards into a university time-share mainframe decades ago to a personal laptop that can compute the results of thousands of model specifications in seconds.)
I see three distinct branches of sports analytics, each with its own value and each overlapping the others.
Sports Analytics Venn Diagram
These sets aren't intended to be to scale. It's not even close, as the Things Interesting to Fans domain spans topics like fantasy football and gambling. Most readers probably identify this area with the kinds of things that popular sites like Football Outsiders or Football Perspective focus on and do very well: team rankings, "best of all time" comparisons, historical statistics, forecasts, and in-depth conventional analysis with a statistical bent. This area is often the entry-point for fans interested in better analysis than what they traditionally get from the media.
The Things Interesting to Science domain includes research on human performance, competition, cooperation, and decision-making under uncertainty. Many of the academic papers on sports are ostensibly aimed at learning something broader about a non-sports scientific topic. Sports can offer a natural laboratory setting with a reasonably controlled environment that researchers can't find elsewhere. A football game is bounded by the clock and end zones, with uniform rules and clear zero-sum objectives. It's also measured and recorded in all kinds of ways. You don't find that in many other natural human settings.
The Things Helpful to Teams domain is probably the hardest to crack. This is where I've been focused in recent years, and it's what I enjoy despite its challenges. This area is almost all overlap--there is probably very little that is helpful to teams that fans aren't also interested in, at least for the fans who are readers at ANS. This domain is all about helping teams make better decisions, on the field, on the sideline, or in personnel meetings.
I'd like to think that ANS is usually right in the middle of the diagram where all three sectors overlap. (Of course I do, I drew the diagram!) Sure, we do fun stuff that is of little use to teams like rankings, best-of comparisons, and game predictions. But I like to focus on the overlap. For example, the Win Probability model can help us decide if Kurt Warner should go into the Hall of Fame (something of interest only to fans), but it can also help coaches make better in-game decisions (something helpful to teams). It can also help tell us whether people are intuitively good at playing minimax games (something interesting to science). It all goes back to step one above: asking meaningful questions.
There's a 4th area of sports analytics I've left out, but it's probably much larger than everything else (except gambling): sports business analytics. This area is about marketing, ticket prices, tv ratings, branding, fan experience, when to have bobble-head night, and stuff like that. Although analytics has thoroughly penetrated this area, there's not much about it that's unique to sports. In the end, sports is just part of the entertainment industry, and at one level it's not terribly different than putting on a series of live concerts or Broadway plays. I'm primarily interested in the "sports analytics" that has to do with the competition on the field (including roster construction and associated financial considerations) rather than in general business analytics, despite the overlap in skills and tools required.
I was inspired to create the sports analytics diagram by one that author and analyst Drew Conway made to describe the skills needed in data science. (By the way, I recommend Drew's book Machine Learning for Hackers--but be warned: learn R first.) In any field including sports, data science requires three types of skills: hacking skills (deep computer skills plus an instinct toward tinkering), math and pure statistical skills, and some amount of subject matter expertise. (I particularly like the Danger Zone warning...There are a couple pieces of football research that immediately come to mind. However, I would argue the "machine learning" intersection is another danger zone in football analytics. But that's a whole different discussion.)
What makes "Analytics" so interesting is that spans so many interesting topics and useful disciplines that it never gets boring. But that's also why it's hard to define.