To see if any of the stats I've been using are related exponentially to wins, I created new variables that are the squares of the passing/running/turnover efficiency stats. By including the squared variables in addition to the original linear version, we can see if there is a significant exponential relationship which might produce a model with a better fit.
For example, we already know that more defensive interceptions produce more wins. But if the squared variable for defensive interceptions is positive as well, then we'd know that even more interceptions produce wins at a faster rate (assuming a direction of causation between interceptions and wins). If the squared variable for defensive interceptions is negative, then we'd know that although interceptions produce wins, there is a diminishing return when a team accumulates a lot of them.
Offensive passing efficiency is consistently the strongest factor in the season win models, so I tried a regression including its squared variable first. The results are listed below (r-squared = 0.74).
The complete model with all the efficiency stats, turnover stats, and penalities may be dividing up the variance of the dependent variable too finely, so I ran a simple model with TRUOPASS and sq_TRUOPASS only. The results are below (r-squared = 0.37).
Had the coefficients been significant, we could construct an equation to estimate wins based on the simple non-linear passing model. This would be represented as follows:
WINS = const + B1 * TRUOPASS + B2 * TRUOPASS^2
WINS = -11.9 + 4.4 * TRUOPASS - 0.18 * TRUOPASS^2
For example, on the strength of their passing offenses alone, the '06 Ravens would be estimated to have 9.01 wins, and the Super Bowl teams Chicago and Indy would have 8.4 and 11.3 wins respectively.
I repeated inserting squared variables for each efficiency stat, and the results were very consistent--none were significant. In addition, I repeated the analysis using logarithmic versions of each variable. Again, they were not significant. The bottom line is that it seems that the model is best (and simplest) when using strictly linear variables. Although the results are consistently nonsignficant, there tends to be a 'diminishing-returns' effect to the extremes of increasing and decreasing performance stats. This makes sense, because teams are bounded by 0 and 16 wins.

0 comments:
Post a Comment