The Elo rating system is a method of ranking players or teams in sports and games. It only considers wins and losses, and it ignores margin of victory. The system was originally created to rate international chess players by Arpad Elo, a physics professor who was himself a master chess player.
In a nutshell, the system estimates the probability one opponent should beat another. If an opponent wins more often than expected, his rating would improve, and vice versa. The algorithm needs to start with a prior expectation of how good each player (or team) is. Then, as the players complete matches, their ratings are adjusted upwards or downwards based on who won. The size of each adjustment is based on how significant the win was. For example, if a grand master chess player beats a novice, his rating would hardly budge, but if a novice beat the master, both ratings would move significantly.
The actual algorithm is based on the function below. EA is the expected win probability of player A. RA is player A's rating, and RB is player B's rating.
After a game between opponents A and B, player A's new ranking (R'A) is revised as:
where K is a maximum size of adjustment, and SA is the actual result of the match. The K value has traditionally been 32 for chess, but it can be adjusted to tailor the system to various other games and sports. Ratings are typically set to have an average of 1500, but this is arbitrary and can be adjusted also.
For example, if player A's rating is 1655 and player B's rating is 1500, then according to Elo's function the probability A would beat B is 0.65. If player A defeats player B, then the actual outcome is 1.00. Player A's new rating would be:
R'A = 1655 + 32 * (1.00 - 0.65) = 1666
One interesting way to look at the ratings is to create a generic win probability. By using the Elo algorithm to compute the expected win probability against a notional average rating, we can get a sense of each team's expected winning percentage.
Sagarin's Application of Elo
Jeff Sagarin uses a version of the Elo system to create NFL team ratings. He transforms them to produce ratings that are predictive of a game's point spread. So the difference between two opponents' ratings, plus an adjustment for home field advantage, predict the margin of victory. Sagarin's adjustment is a straightforward linear transformation of the original Elo system, as you can tell from the graph below. (I suspect Sagarin may over-weight recent games, however.)
Using the same method as I described in my last post, we can mimic Elo ratings. That method computed team ratings based on margin of victory from each game. Instead of using margin of victory we can simply replace the score of each game with a 1 or 0 based on who won. Then we can solve for the ratings that best estimate the game outcomes. Because the ratings are linear we can transform them into individual game probabilities or generic win probabilities using a logistic transformation:
These rating systems can be adapted for any type of game or sport. Recently, on-line games have been using similar algorithms to rank players. The primary advantage to this type of system is that it discounts victories over very weak opponents. Often players will set up phony opponents to beat in order to inflate their own scores.
To get a sense of what these rankings would look like for the most recent (2007) NFL season, the table below lists several ratings for each team. The Elo column lists the ratings I derived from the actual Elo algorithm. The Sagarin column lists Jeff Sagarin's version of Elo--his final 2007 season ratings . Lastly, based on the Elo algorithm, the win probability column lists the probability each team would beat a league-average team on a neutral site. All ratings include results from the playoffs and Super Bowl.