Monday, May 25, 2009

SEC or Pac-10?

The SEC is generally regarded as the strongest conference in college football, but there are murmors that the Pac-10 is as good, or maybe better when the entire depth of the conferences is considered. Running TopoRank over the data from the last 5 seasons gives us a list of team and conference rankings.

The SEC is in fact on top, but the ACC sneaks into 2nd place, followed closely by the Pac-10. The Big Ten, Big 12 and Big East are a step behind, and the non-BCS conferences are not even in the picture.

The overall standings of the conferences are mirrored by the better teams in each. The SEC has tremendous elite balance with Florida, Georgia, LSU and Auburn in the top 11, while the ACC places VT, FSU, Clemson and BC in the top 15. USC dominates the Pac-10 (and overall rankings), with ASU and Cal its closest competitors. The MWC tops out with one team, Utah, at #20, and TCU and BYU just sneak into the top 40. The WAC's top team is #23 BSU, 2nd place Fresno State (#59) is a very average team.

The Hot Seat

A fun alternative use of TopoSort is assessing a team's improvement (or ruin) and more fairly judge if a coach should be on the hot seat. For example, a team's record may not change much from one year to the next, but the quality of games might.

Mike Stoops at Arizona was often mentioned in hot seat discussions over the last three years, largely because Arizona hadn't made a bowl game in his 3rd and 4th year (2006 and 2007) before finally coming through in 2008. Looking at the ranking of the Wildcats over Stoops's tenure, we see them finish 66th in 2004, then 48th, 37th, 35th and 36th. The Cats were playing a lot of strong opponents tight and winning some of those games, a clear indication that their level of play had come a long way from the prior three years (70th, 77th, 95th) which ended with players walking out on coach John Mackovic. The numbers also support that Arizona got shafted out of a bowl in 2006, and the no-bowl discussion should never have happened.

NCAA Football Rankings

Every year in college football there's a big uproar over whether there should be a playoff, which teams are playing for the national championship, if a mid-major should get a BCS bowl (or national championship) bid, whether a dominant team like USC should be left out of the championship game because of one "bad" loss in mid-season, etc. The core flaw in the system is that the rankings are largely subjective and cryptic. What does it mean for one team to have a BCS rating of 0.98 and the other 0.95? What difference does that represent? What if a team beat another team that was thought to be good, but that team plummets through the rest of the season, or vice versa? Are teams getting penalized for not running up scores?

A fair ranking should obey several principles:
1. Every game is equally important
2. A single game can only directly compare the two teams playing in that game
3. Running up the score should not be rewarded
Given that each game is a comparison between two teams, we can then create an oveall tiered ranking that most closely accommodates all the game results throughout the season. In its current incarnation, the ranking considers teams to be equivalent if the score was within 3 points, 1 tier different within 10 points, 2 tiers within 20, and capped at 3 tiers better for a 3TD or larger spread. These tier differences are all compounded, so if team A is 2 better than team B, who in turn is 3 better than team C, we conclude that team A is 5 tiers above team C. If team A is 1 better than team B, who is 1 better than team C, who is 1 better than team A (Each team goes 1-1), they will come out evenly ranked. Each week we add the newest scores into the mix and recalculate a new TopoRank for the season.

Without further ado, here are the results for the last 5 seasons. In each season, the TopoRank correctly identifies over- and under-rated teams, whose performances in the major bowl games corroborate the findings. Here are some highlights:

Was Oklahoma the right choice from the Big 12? Yes. They were better than Texas over the season, and definitely better than Texas Tech. I think this was clear to anyone who watched these teams play.

Was USC over-penalized for its loss to Oregon State? Yes. USC's national title dreams were sunk with a mid-season loss at the Beavers, who ended up #31 in the 2008 rankings. Florida lost to #11 Mississippi, and was the better team regardless. In voters' eyes, Oklahoma's lone loss to #4 Texas was a smaller penalty, however, the lower quality of their wins was not factored in. USC ranked higher than Oklahoma, but was relegated to smack around the Big Ten again.

Was the committee right to not award Utah the national championship game? Yes. Utah played a great game against #5 Alabama in its Sugar Bowl win, but their body of work throughout the season was only good enough to push them to #14. They were clearly hurt here by their conference strength of schedule, winning close non-conference games against [usually strong] Michigan (who finished #74) and #22 TCU didn't give them the push they needed to get into the elite picture.

Ohio State vs LSU or Georgia? Perhaps it should have been Virginia Tech or Florida? VT was never really in the discussion due to getting slaughtered at LSU and then being on the wrong end of Matt Ryan's last second heroics. Florida had 3 regular season losses. LSU and Georgia's losses were most forgiven (while other contenders like Missouri and Kansas were derailed) because they were in SEC play. There was no head-to-head result between LSU and Georgia, and the decision was essentially random. 2007 had no clear top team, a fact that the ratings bear out. Years like this are most compelling for some kind of playoff system.

Was Hawaii a good choice for the Sugar Bowl? Anyone watching that game saw two teams on completely different levels, which is entirely consistent with #46 Hawaii being 1.6 tiers below Georgia. Hawaii outgunned weak teams all season long, but never belonged on the stage with the Bulldogs.

Was Illinois a reasonable choice for the Rose Bowl? Opinions were split over whether Illinois was sufficiently strong to compete in the Rose Bowl against USC (the answer was clearly no). Illinois was the 3rd best team in the Big Ten, and was chosen to keep the Pac-10 v Big Ten tradition of the game. The credibility of the game would have been better served if Cincinnati or Clemson were chosen instead, though the difference was not drastic.

Should Ohio State and Michigan play for the national title? Ohio State had defeated Michigan in an epic 42-39 showdown at the end of the season to earn their spot atop the Big Ten. Debates raged over whether Michigan was the 2nd best team, and if the championship should be a rematch. In a last second ranking miracle, Florida overtook Michigan, quieted doubters by destroying the Buckeyes and really stirred up the Big-Ten-is-overrated discussion. Our numbers vote in favor of this. USC and Florida were the two best teams in the country (though USC sunk themselves with a late-season loss to UCLA), but Ohio State would not be denied with a perfect record.

Did Notre Dame deserve a BCS bowl game? Absolutely not. The Fighting Irish finished 24th in our rankings, right in line with the 41-14 beating they received from LSU. Many considered the Irish to be floating on hype, and this bears out in the numbers and the reality when facing top oppoonents.

How big an upset was the greatest bowl game ever? It wasn't. Boise State finished the season #12 while Oklahoma was #21 (another questionable BCS choice). The Broncos were well on their way to a strong victory when they had a meltdown, necessitating the heroics. #7 West Virginia, #13 Virginia Tech, #14 Rutgers or #15 Georgia would all have been better choices, but Oklahoma rode an 8-game win streak past all of them into the Fiesta Bowl. Again, an argument for why all games should be weighted equally.

No qualms about the USC vs Vince Young championship game. Again we see some teams having stellar seasons degraded by untimely losses (Virginia Tech).

2004 is compelling because it's the only time in the last 5 years that a mid-major really did get snubbed out of the national championship game. Utah dismantled all opponents on the way to a perfect season, but didn't get their chance against USC. Instead they clobbered an overrated (#33) Pittsburgh team while the Trojans had their way with Oklahoma. This was a situation where a mid-major had the quality of victories (strength of schedule and dominant victories) to be rightfully placed in the championship game, but the general voter perception of the Mountain West clouded their judgement. Their situation was differentiable from Hawaii '07, and I can't imagine they would have done worse than Oklahoma.

Thursday, May 14, 2009

Running Back Rating

The true value of a good running back is that he can consistently gain yards, get first downs, and thus keep drives going. It's not as simple as looking at a back's yards per carry though. Many backs average over 4.0 ypc, yet clearly their team isn't able to just hand it off to them play after play. Granted, no team can attack a defense by constantly giving the ball to one player, yet certain backs are clearly more able to churn out the yards when it matters. So, how can this ability be measured?

To be able to consistently get first downs requires a back gain 10 yards every 3 downs (on the assumption that teams rarely go for it on 4th). Considering that the yardage gained is only really valuable if there's a first down (or touchdown) involved, it makes sense to award yards for it. To compensate, and keep the final numbers close to "real yards", we will offset the additional yards for firsts and touchdowns by removing the minimum yards needed per down (3.33) to keep gaining firsts. Furthermore, adjustments should be made for fumbles. Thus is born the Running Back Points Per Carry:

RBPPC = (Adjusted Yards - 3.33*(Attempts) + 10*(1st downs + TD)) / Attempts

Charting Percentile v RBPPC for featured backs** since 2003 shows that about one third of them combine rushing average with enough scores and fresh downs to keep their offense on the move. 3.33 is, by definition, the borderline score for a strong every-down runner.

View the complete list here.

LaDainian Tomlinson has been highly effective through 2007, but had a very low year in 2008.

Clinton Portis was stellar in Denver (5.03), but has been far less effective, despite huge yardage numbers, since moving to the Redskins (2.22, 3.20, 2.36, 3.28).

Cincinatti probably made a terrible decision to give Cedric Benson a big contract. He was borderling in 2006, but has been around 1.7 the last two years. That's just not going to cut it.

Michael Turner put the Falcons' running game on his back with a rating of 3.70 and over 23 carries a game.

The Giants didn't skip a beat transitioning from Tiki Barber to the Brandon Jacobs/Derrick Ward duo (in fact, they got even better).

Edgerrin James's decline after being a strong option in Indy (3.41, 3.36, 3.70) was so sudden (1.84, 1.99) that it pretty much confirms a weak Cardinals' offensive line. He wasn't a breakaway threat for either, and was still one of the hardest guys to bring down. Think Emmitt Smith's 1.72 in Arizona was a coincidence?

** - 150 season rush attempts

Tuesday, May 12, 2009

Rushing Yardage: What is a "Good Year"?

Usually 1000 yards or 1200 yards are the benchmarks for a good rushing season in the NFL. To see just how well this holds (i.e., what percentage of RBs actually attain these goals) I plotted how the most active 59 RBs (i.e. top RBs in attempts) stacked up.

The figure above shows the yards (and adjusted yards) cumulative distribution functions for the 2008 season. The median is roughly around 727 yards (or 629.9 adjusted yards), while the 3rd quartile (75 percentile) is around 1014 yards (932.5 adjusted yards). Interestingly, the top 90% of rushers, however, rush for 1246.8 yards (1216.21 adjusted yards).

In summary, 1000 and 1200 yard seasons are good estimates of 'extraordinary' rushing seasons.

Sunday, May 10, 2009

Rethinking the rushing crown

The 'rushing crown' is awarded to the player who gains the most yardage on the ground each year, and this is what is often used to determine who is the best RB in the league. Yet, when a RB fumbles the ball after a 20 yard gain on 3rd down, we don't cheer for the 20 yards that were 'gained', we lament the 20 yards that were lost on the punt had he fallen down at the line of scrimmage. If we take fumbles into account -- who is the top rusher in the league?

The difficulty here is figuring out how much to penalize a player for a fumble. A fumble on a first down is clearly worse than a fumble on 4th down (assuming equal return yards), but we don't have access to this data, so this is difficult. We can try to approximate this by deducting how much the average carry would have brought us (~4.2 yards). We then tack on to this how much field position we lost had we simply punted the ball, which is roughly 39.5 yards.

The number of fumbles also seems to inflate the penalty, given the recovery of a fumble is roughly a coin flip. Thus instead of taking the total number of fumbles, we take 50% of the total number of fumbles.

We calculate adjusted yards per carry as follows:
  • A.YDS = YDS - FUM * 0.5 * (4.2 + 39.5)
Using this formula we produce a new 2008 rushing leaders list:

2008 Rushing Leaders
1M. Turner RB, ATL37616991,6334.54.341732
2A. Peterson RB, MIN36317601,5634.84.311094
3D. Williams RB, CAR27315151,5155.55.551800
4C. Portis RB, WAS34214871,4214.34.16933
5Thomas Jones RB, NYJ29013121,2684.54.371321
6Steve Slaton RB, HOU26812821,2384.84.62921
7Matt Forte RB, CHI31612381,2163.93.85811
8Chris Johnson RB, TEN25112281,2064.94.81911
9Ryan Grant RB, GNB31212031,1163.93.58443
10L. Tomlinson RB, SDG29211101,1103.83.801100
11B. Jacobs RB, NYG21910891,02354.671531
12M. Lynch RB, BUF25010369924.13.97621
13Derrick Ward RB, NYG18210259815.65.39220
14Jamal Lewis RB, CLE27910029803.63.51410
15S. Jackson RB, STL25310429334.13.69753
16Kevin Smith RB, DET2389769324.13.92821
17B. Westbrook RB, PHI23393691443.92910
18Frank Gore RB, SFO24010369054.33.77863
19Ronnie Brown RB, MIA2149168944.34.181011
20L. McClain FB, BAL2329028363.93.611030
21Marion Barber RB, DAL2388858193.73.44731
22J. Stewart RB, CAR1848367924.54.311021
23Willie Parker RB, PIT2107917913.83.77500
24Justin Fargas RB, OAK2188537873.93.61131
25Warrick Dunn RB, TAM1867867864.24.23200
26LenDale White RB, TEN2007737733.93.871500
27Larry Johnson RB, KAN1938747654.53.96551
28M. Jones-Drew RB, JAC1978247374.23.741242
29Sammy Morris RB, NWE1567277054.74.52711
30Cedric Benson RB, CIN2147477033.53.29221
31W. McGahee RB, BAL1706716273.93.69722
32Julius Jones RB, SEA1586986114.43.86242
33Pierre Thomas RB, NOR1296256034.84.68911
34Mewelde Moore RB, PIT1405885884.24.20500
35R. Williams RB, MIA1606595724.13.57442
36Fred Taylor RB, JAC1435565563.93.89100
37M. Morris RB, SEA1325745524.34.18011
38Fred Jackson RB, BUF1305715494.44.22311
39D. Rhodes RB, IND1525385383.53.54600
40Joseph Addai RB, IND1555445223.53.37511
41E. Graham RB, TAM1325635194.33.93422

The list ordering does not appear to change too drastically, but this is somewhat expected since most players fumbled about the same number of times (~60% of the players on this list did not fumble more than twice and ~80% did not fumble more than 3 times).

There are some noteworthy movers, however. Michael Turner is the 2008 rushing king and not Adrian Peterson, whose yards per carry drops from an excellent 4.8 to an average 4.31. NFC West running backs faired poorly as Stephen Jackson drops from 12th to 15th and Frank Gore drops from 13th to 18th.

On a positive note, running backs like DeAngelo Williams are now graded much more closely to Adrian Peterson. Instead of them being ~250 yards different, they are separated by ~50 yards. This makes sense to me at least. If the turnover battle is as big to winning football games as people say it is, we should take that into consideration before naming Peterson the league's best rusher.

The data used above was taken from the 'rushing leaders' list from, meaning that players who were listed below 41 on the original list are not listed here.