Monday, June 16

College Football's Pythagorean Expectation

When we sat down to rank the 84 BCS teams, one of the more interesting tools Chris brought to the discussion was a team's Pythagorean Expectation. In baseball, this is widely used to derive a team's expected winning percentage based on their runs scored and runs allowed. It's frighteningly accurate - usually within 2 or 3 games of the team's actual win total.

Pythagorean Expectation on Winning Percentage = (PointsScored^2) / (PointsScored^2 + PointsAllowed^2)

The folks at baseball-reference.com ran some optimization and determined that, for baseball, an exponent of 1.83 actually produces the most accurate results. In basketball, an exponent of 14 (!!) is optimal. High exponents tend to fatten the tails of the expected winning percentages (ie, spread the distribution), and low exponents tend to fatten the middle (shrink the spread).

So this was something we looked at for the BCS teams when making those rankings, using both 2 and 1.83. Of course it wasn't the end-all-be-all, nor even one of the most important factors considered. It wasn't even a factor at all in the pre-groupings, only used to help order the teams within their smaller groups once those were created. Some of those early FSU teams received ridiculous scores by mercillessly blowing out opponents, and clearly we couldn't just put 00 FSU in the top 10 for example. There was the problem of how to weight games against I-AA opposition (we chose to count both points scored and points allowed at half their value in those games). And of course, the MLB has a 161-game season in which there isn't a horrible amount of fluctuation in the total points scored in a game (so all games are weighted roughly equally). College football teams play 12-14 games a season and may win 10-6 one week followed by 42-13 the next. This means that a) the formula is going to be a little less useful here and b) it's probably going to need re-optimizing.

Nonetheless, interesting findings included 2001 Miami having the highest score of and BCS team, most of our bottom 10 teams in fact being in the bottom 10, and 2005 Texas and 2004 USC being separated by a razor-thin margin (advantage Texas). If nothing else, it sparked an interest in pursuing this further.

Being a statistician, my first thought was why not find a least-squares estimate? For the 84 BCS teams, with I-AA games at half-weight, the optimal exponent wound up being 2.87. However, this only tests a range of teams that had winning percentages above .600 - often above .800 - and "at-large" teams who had impressive seasons (probably blowing lots of people out). It's not a representative sample of what goes on in college football, so its end product is not the most useful tool.

So this weekend I finally got around to looking at this for the entire NCAA division I-A. To save entry time, games against I-AA and below opposition is weighted equally with the other games (I'm not convinced it's going to greatly affect the exponent, and then I can just pull point totals off James Howell's page). I used all 119 teams for the 2005, 06, and 07 seasons for a total of 357 data points. Analysis on past data would, then, make the assumption that the distribution of scores is not changing significantly over time.

The final result is an optimal exponent of 2.15. This produces an estimate which, on average, predicts a team's winning percentage to be just half a percentage point higher than their true result! A t-test on the differentials showed that this is nowhere near significant - ie, the estimate is unbiased. The root mean square error was 9.223 - meaning that a lot of the differences between Pythagorean Expectations of similar-talent teams are not going to be able to prove anything statistically. Again that's due to the small sample size of each season, differences in schedule strengths, and vast disparities in team strengths (by that I mean, if 2001 Miami and that season's Duke team played 100 games, Miami would probably win all 100. Duke probably has less than a 1% chance of the upset. On the other hand if the World Series Champions played 100 against the worst MLB team, I'd be shocked if the champs won more than 70% of the games.) It's just not *as* good of a metric for this sport as it is for others.

Regarding distribution shape, the differentials had a negative skew of -.9696, far larger than the rejection criterion for normality. There was a Kurtosis of 6.0888 as well. Being an applied statistician, of course I know that the proscribed action is to ignore these and continue to treat the data as if it were normal, merely making a footnote that it is in fact not ;-)

Here are some sample results:
The 10 BCS champions, plus 2003 USC and 2004 Auburn, plus anybody else in our Top 10, and then our Bottom 10 to show how badly they suck in comparison
2001 Miami: 95.98%
----- This marks the line where if they played the entire season over, you would still lose less than one game -----
2003 LSU: 91.84% (I don't buy #2, but are we collectively underrating these guys? Remember they also got in over USC based on a strong SOS.)
2005 Texas: 91.72%
2004 USC: 91.01%
2004 Auburn: 90.39%
2000 Miami: 89.15%
2000 Oklahoma: 87.57%
1998 Tennessee: 85.48%
1999 Florida State: 85.19%
2002 Ohio State: 85.00%
2003 USC: 84.92%
2006 Florida: 84.50%
2005 USC: 83.81%
2002 Miami: 83.48%
2007 LSU: 80.59%
2005 Notre Dame: 70.41%
2000 Purdue: 68.41%
2006 Wake Forest: 67.49%
----- This marks the line where winning 4 nonconference games and going 4-4 in conference play would land you -----
2001 Illinois: 66.24%
2000 Notre Dame: 64.57%
2005 Florida State: 64.30%
2006 Notre Dame: 63.90%
2007 Illinois: 62.93%
1999 Stanford: 57.38%
2004 Pittsburgh: 56.46%

Anyway, you'll probably hear this term thrown around in some upcoming posts about the massive all-time BCS rankings we did, or some mid/late-season analysis of teams in 2008 and beyond. We'll be using the 2.15 exponent unless that gets revisited with a larger data set. Again, it's a useful statistic for comparing teams that you already believe are similar in strength - a measure of how strongly they've dominated the opposition, which of course is only meaningful when taken into account along with the quality of opposition they faced. It's certainly nothing definitive. It's also perhaps a measure of how "lucky" a team has been - if they should win 60% of their games but they're sitting pretty at 9-1, there's probably been some fortunate bounces/calls along the way.