Advanced NFL Stats: Drunkards, Light Posts, and the Myth of 370

Jul 27, 2008

Drunkards, Light Posts, and the Myth of 370

Running back overuse has been a hot topic in the NFL lately, partly because of Football Outsiders' promotion of their "Curse of 370" theory. In several articles in several outlets, including their annual Prospectus tome, they make the case that there is statistical proof that running backs suffer significant setbacks in the year following a season of very high carries. But a close examination reveals a different story. Is there really a curse of 370? Do running backs really suffer from overuse?

Football Outsiders says:

"A running back with 370 or more carries during the regular season will usually suffer either a major injury or a loss of effectiveness the following year, unless he is named Eric Dickerson.

Terrell Davis, Jamal Anderson, and Edgerrin James all blew out their knees. Earl Campbell, Jamal Lewis, and Eddie George went from legendary powerhouses to plodding, replacement-level players. Shaun Alexander struggled with foot injuries, and Curtis Martin had to retire. This is what happens when a running back is overworked to the point of having at least 370 carries during the regular season."

While it's true that RBs with over 370 carries will probably suffer either an injury or a significant decline in performance the following year, the reason is not connected to overuse. What Football Outsiders calls the 'Curse of 370' is really due to:

Normal RB injury rates
Natural regression to the mean
A statistical trick known as multiple endpoints
(And this should go without saying, but the "unless he is named Eric Dickerson" constraint is silliness.)

Injury Rate Comparison

In the 25 RB seasons consisting of 370 or more carries between the years of 1980 and 2005, several of the RBs suffered injuries the following year. Only 14 of the 25 returned to start 14 or more games the following season. In their high carry year (which I'll call "year Y") the RBs averaged 15.8 game appearances, and 15.8 games started. But in the following year ("year Y+1"), they averaged only 13.0 appearances and 12.2 starts. That must be significant, right?

The question is, significant compared to what? What if that's the normal expected injury rate for all starting RBs? If you think about it, to reach 370+ carries, a RB must be healthy all season. Even without any overuse effect, we would naturally expect to see an increase in injury rates in the following year.

In retrospect, comparing starts or appearances in such a year to any other would distort any evaluation. This is what's known in statistics as a selection bias, and in this case it could be very significant.

We can still perform a valid statistical analysis however. We just need to compare the 370+ carry RBs with a control group. The comparison group was all 31 RBs who had a season of 344-369 carries between 1980 and 2005. (The lower limit of 344 carries was chosen because it produced the same number of cases as the 370+ group as of 2004. Since then there have been several more which were included in this analysis.)

Fortunately there is a statistical test perfectly suited to comparing the observed differences between the two groups of RBs. Based on sample sizes, differences between means, and standard deviation within each sample, the t-test calculates the probability that any apparent differences between two samples are due to random chance. (A t-test results in a p-value which is the probability that the observed difference is just due to chance. A p-value below 0.05 is considered statistically significant while a high p-value indicates the difference is not meaningful.) The table below lists each group's average games, games started, and the resulting p-values in their high-carry year and subsequent year.

Comparison of Games Played and Started for High-Carry RBs

	G Year Y	G Year Y+1	GS Year Y	GS Year Y+1
370+ Group	15.8	13.0	15.8	12.2
344-369 Group	15.8	14.0	15.4	12.6
P-Value	0.62		0.68

The differences are neither statistically significant nor practically significant. In other words, even if the sample sizes were enlarged and the differences became significant, the difference in games started between the two groups of RBs is only 0.4 starts and 1.0 appearances. RBs with 370 or more carries do not suffer any significant increase in injuries in the following year when compared to other starting RBs.

Regression to the Mean

The 370+ carry group of RBs declined in yards per carry (YPC) by an average of 0.5 YPC compared to a decline of 0.2 YPC by the 344-369 group. This is an apparently statistically significant difference, but is it due to overuse?

Consider why a RB is asked to carry the ball over 370 times. It's fairly uncommon, so several factors are probably contributing simultaneously. First, the RB himself was having a career year. He was probably performing at his athletic peak, and coaches were wisely calling his number often. His offensive line was very healthy and stacked with top blockers. Next, his team as a whole, including the defense, was likely having a very good year. Being ahead at the end of games means that running is a very attractive option because there is no risk of interception and it burns time off the clock. Additionally, his team's passing game might not have been one of the best, making running that much more appealing. And lastly, opposing run defenses were likely weaker than average. Many, if not all of these factors may contribute to peak carries and peak yardage by a RB.

What are the chances that those factors would all conspire in consecutive years? Linemen come and go, or get injured. Opponents change. Defenses change. Circumstances change. Why would we expect a RB to sustain two consecutive years of outlier performance? The answer is we shouldn't. Running backs with very high YPC will get lots of carries, but the factors that helped produce his high YPC stats are not permanent, and are far more likely to decline than improve.

If I'm right, we should see a regression to the mean in YPC for all RBs with peak seasons, not just very-high-carry RBs. The higher the peak, the larger the decline the following year. And that's exactly what we see in the data.

The graph above plots RB YPC in the high-carry year against the subsequent change in YPC. The blue points are the high-carry group, and the yellow points are the very-high-carry group. Note that there is in fact a very strong tendency for high YPC RBs to decline the following year, regardless of whether a RB exceeded 370 carries.

Very-high-carry RBs tend to have very high YPC stats, and they naturally suffer bigger declines the following season. 370+ carry RBs decline so much the following year simply because they peaked so high. This phenomenon is purely expected and not caused by overuse.

Statistical Trickery

Why did Football Outsiders pick 370 as the cutoff? I'll show you why in a moment, but for now I'm going to illustrate a common statistical trick sometimes known as multiple endpoints by proving a statistically significant relationship between two completely unrelated things. I picked an NFL stat as obscure and random as I could think of--% of punts out of bounds (%OOB).

Let's say I want to show how alphabetical order is directly related to this stat. I'll call my theory the "Curse of A through C" because punters whose first names start with an A, B, or C tend to kick the ball out of bounds far more often than other punters. In 2007 the A - C punters averaged 15% of their kicks out of bounds compared to only 10% for D - Z punters. In fact, the relationship is statistically significant (at p=0.02) despite the small sample size. So alphabetical order is clearly related to punting out of bounds!

Actually, what I did was sort the list of punters in alphabetical order, and then scanned down the column of %OOB. I picked the spot on the list that was most favorable to my argument, then divided the sample there. This trick is called multiple endpoints because there are any number of places where I could draw the dividing line (endpoints), but chose the most favorable one after looking at the data. Football Outsiders used this very same trick, and I'll show exactly how and why.

The graph below plots the change in yards per carry (YPC) against the number of carries in each RB's high-carry year. You can read it to say, a RB who had X carries improved or declined by Y yards per carry the following year. The vertical line is at the 370 carry mark.

Note the cluster of RBs highlighted in the top ellipse with 368 or 369 carries. They improved the following year. Now note the cluster of RBs highlighted in the bottom ellipse. They had 370-373 carries and declined the next year.

If we moved the dividing line leftward to 368 then the very-high-carry group would improve significantly. And if we moved line rightward to 373, then the non-high carry group would decline. Either way, the relationship between high carries and decline in YPC disappears. There is one and only place to draw the dividing line and have the "Curse" appear to hold water.

To be fair to Football Outsiders, they have recently admitted there is nothing magical about 370. A RB isn't just fine at 369 carries, and then on his 370th his legs will fall off. But unfortunately, that's the only interpretation of the data that supports the overuse hypothesis. If you make it 371 or 369, the relationship between carries and decline crumbles. It's circular to say that 370 proves overuse is real, then claim that 370 is only shorthand for the proven effect of overuse. It's pretty clear from the graph above that they assumed overuse was real, then sought an analysis to support their claim.

As Mark Twain (reportedly) once said, "Beware of those who use statistics like a drunkard uses a light post, for support rather than illumination."

Ideas, data, quotes, and definitions from Doug Drinen, PFR, Maurlie Tremblay, and Brian Jaura.

21 comments:

Doug said...: Great stuff. I can't usually get my head wrapped around your statistical stuff but that was good.; Monday, July 28, 2008
coldbikemessenger said...: great job; Tuesday, July 29, 2008
Jim Glass said...: Excellent.; Wednesday, July 30, 2008
Jerry Tsai said...: Nice post. Aaron Schatz and the Football Outsiders should be credited with bringing better metrics to measure the quality of football play, but they do sometimes stray into small-sample over-interpretation. They have to walk the line between provoking commentary and unbiased analysis. As you have shown, their "370" theory is more commentary than analysis.

Your last figure ably demonstrates the peril of picking a cutpoint based on anecdote.; Friday, August 01, 2008
Anonymous said...: Bro, you are awesome. How am I just now finding this site.; Thursday, August 28, 2008
Eddo said...: Very good analysis.

I will say, however, that Football Outsiders usually promotes this theory to project that a RB will not produce as well the next year. It seems like they actually use it properly (basically implying regression to the mean), but communicate it poorly (stating it as some kind of magic number).

It's especially useful for fantasy football, where most "experts" will rate heavily-used backs very highly just because they had a lot of success the previous year, whereas FO is likely to advise you to avoid drafting such players, as their production is sure to decrease and lower their value dramatically.; Wednesday, September 03, 2008
Kevin said...: You make some good points, but for someone pointing out invalid statistics, you make too many loose, uncalled for, or downright wrong assumptions and conclusions.

First, the curse of 370 was obviously a little tongue and cheek. Representing it otherwise is intellectually dishonest. You could have completely missed the point, but you seem to be smart, so I doubt it.

Second, FO has never said that 370 was completely magical or that someone with a high number of carries will definitely blow out a knee the next year. They just suggest that unless you're superhuman, it's more likely for a downturn the next year if your carry load is extreme. References to 370 as a jinx are, as noted above, a joke.

Third, FO was very upfront with their use of selection bias. That factored into the curse joke.

Also, upping the cutoff does not skew the results greatly. You might notice that the 4 immediate downturns are paired with 3 upturns. The "Curse of overuse" isn't nearly as catchy as the "Curse of 370"

Fourth, your re-addition of Dickerson is not in good faith. All statistics occasionally have outliers, and those outliers can skew the data. FO was upfront that if a back is extremely well put together (like Dickerson), the curse of 370 does not apply. If Brandon Jacobs goes over 370 carries, this caveat would definitely be brought up. Dickerson makes for 5 of the top carry season that did not have significant decline. If you go by player, and not by season, then your graph is considerably changed. If you are looking at the curse objectively, the only reason not to look at the curse that way is that it hurts your argument.; Tuesday, October 21, 2008
Brian Burke said...: Kevin, that's exactly the point. There is no "curse of overuse." There is only a "curse of 370"--and not 371 or 369 It's a classic example of multiple endpoints, ignorance of regression to the mean, and bad methodology--Dickerson included or not.

The Dickerson exclusion does not satisfy any scientific standard. Good methodology doesn't throw exclude outlier 'people,' it excludes outlier 'cases.'

Besides, to exclude an outlier requires more than just saying 'if we include him our whole point goes out the window.' You need an a priori reason. Also, If you want to throw out his years of bucking the "curse" you have to throw out the same number of cases on the other end of the scale.

What FO did was cherry-pick the data. That's what I'd call bad-faith, loose, and downright wrong.

Regarding the tongue-and-cheek nature you mention, I don't buy that. Yes, they've tried to backtrack quite a bit recently, but as I mentioned in the article you can't say that 370 proves overuse is real, then claim that 370 is only shorthand for the proven effect of overuse. They're still peddling this stuff too. Just today, in fact, they published an article about how the "curse" might catch up with Clinton Portis.

PS What the heck does "well put-together" mean? -- RBs that sometimes defy the "curse?" So we're supposed to define well put-together by how well they defy the curse, then exclude them from the analysis that proves there is a curse? That's like saying 'all cars are blue'...excluding the cars that aren't blue.; Tuesday, October 21, 2008
Anonymous said...: Yeah, take that Kevin!; Tuesday, October 21, 2008
Patrick said...: I did a bunch of research on this a couple years ago when it first came out because the Packers were considering adding Larry Johnson and I was curious how accurate it was. I found the "370 curse" to be full of holes and typed up my findings which were sent back to them via a friend. Don't know that my data ever saw the light of day after that but there are multiple problems with their theory.; Sunday, October 26, 2008
Ben Stuplisberger said...: Excellent analysis. I learned quite a bit about statistics, especially multiple endpoints.

Would the analysis differ if the running back was less effective in their high carry year? For instance, Larry Johnson averaged 5.2 ypc in 336 carries in 2005, while averaging 4.3 ypc in his 400+ carry year. He declined significantly since then. What does the data say about backs in these situations?

Also, excuse my ignorance of statistics, but explain why it is invalid to exclude Dickerson? If he was the only back to consistently escape the curse, then why is it unfair to exclude him?; Saturday, January 10, 2009
Brian Burke said...: Ben-All I can say about the Larry Johnson example is that backs with unusually high stats will tend to fall to earth regardless of how many carries they had. It just happens that backs with very good seasons are going to fed the ball a lot. This gives the appearance of high carries "causing" the decline simply because it precedes it.

Regarding Dickerson--Dickerson is a person; he is not a 'case' in the statistical sense. It's fine to throw out outlier cases in studies if they will bias the data. But you can't pick a person and throw out all his cases simply because they belong to him. And if you do throw out outlier cases, you have to balance things by excluding an equal number of cases on the other side of the coin.

For example, if you throw out 5 cases of RB-seasons that defy the "curse," then good methodology dictates you also exclude 5 cases that substantiate the curse. You can't just look at the data, and throw out a guy because he proves your case wrong.

As I said in an earlier comment--It's as misleading as saying "all cars are blue...except all the cars that aren't blue." That is, "all backs with >370 carries decline the following year...except the ones that don't."; Sunday, January 11, 2009
Ben Stuplisberger said...: Thanks for the explanation.

Has FO heard about this article? Commented on it?; Wednesday, January 14, 2009
Ben Stuplisberger said...: I'll take that as a no...; Tuesday, January 20, 2009
Brian Burke said...: Sorry...

I'm not sure. It's been posted on their comment boards many times, but I can't say for certain. I don't think they really care about the research. I think they're just tying to find a hook to hype book sales.; Tuesday, January 20, 2009
Ben Stuplisberger said...: Yeah, or sell out to ESPN.; Thursday, January 22, 2009
Kevin said...: "The Dickerson exclusion does not satisfy any scientific standard. Good methodology doesn't throw exclude outlier 'people,' it excludes outlier 'cases.' "

Good methodology throws out outlier cases when each case is independent. That's not what we have here. One person with multiple results that buck an otherwise existing trend IS significant. If you don't want to see that, then you're intentionally blinding yourself. It is you that is committing an error by taking each of these seasons as an independent case. If each season was put up by a different individual, that would be valid, but they aren't. Each season is not an independent case. As I said before, for someone who is so good at finding flaws in the methodologies of others, it is highly suspect that you can not see your own errors.

In no way am I claiming that the 'Curse of 370' is something set in stone. Much like you, I just don't like seeing intellectual dishonesty.

As for the curse, you apparently do not have a basic knowledge of sports history. The head of FO is from Boston. Boston is the irrational baseball curse capital of the world. Even in Baltimore, the Curse of the Bambino, as stupid as it is, was talked about by sportswriters as real. It was clear to anyone with a sense of irony that the 'Curse of 370' was a not so subtle skewering of Boston sports media in general, and one guy in particular.

As for the well put together comment, I chose my words poorly, and you were right to jump on that loose definition. Let me reclarify. At the time he played, Dickerson was larger than most LBs in the league. How many other running backs does that apply to? Brandon Jacobs is bigger than most LBs in the league now, hence my comparison. When talking about the physical effects of running the ball, I think taking into account the size of the runner might be important. Getting hit by people smaller than you is different from getting hit by people larger than you.

There are any number of factors that go into the wear and tear of a RB. I agree that the 'Curse of 370' is not, by any stretch, complete, but it works as a warning against overuse.; Friday, February 13, 2009
Anonymous said...: Kevin, I'm not sure what you're talking about. I've written FO multiple times, and NOT ONCE have they replied that their "curse" is a "joke." They defend their analysis and say "look at the history." (I want to shout back in my best John Cleese fake accent "look at the bones!")

The fact of the matter is, they have made a killing getting paid by ESPN to do half-baked pieces on 370 for one reason: people read them. Many of their analyses--see today's NFC West segment, it's amazing--are terrific. So why, when they're capable of producing something excellent, they trot out the Curse of 370 every year, is beyond me.

They sound like a politician backed in a corner...they already made their statement (370 or more carries is bad for a running back) and rather than saying "oops, we were not clear enough, we meant X" they continue to defend a hypothesis that CLEARLY does not bear out. You can defend them all you want, but analysis like the one above refutes their work in this instance.

On a side note, I've run something similar to the above without ever having found it...not as elegant, but it also found that there was not a statistically significant link for a set cut-off. Someday I'll find a forum to post it. ~Greg; Friday, February 13, 2009
Danish! said...: I haven't have time to read all the comments yet, so my quibble might have been mentioned.

It seems weird to evaluate RB-production solely on YPC. I know that you examine their starts and appearences but you forget those later on. Especially in the multiple-endpoints-part. Let's say a RB has 380 carries and 4.6 YPC in year Y, and then in Y+1 has 84 carries for a 5.2 YPC. That will seem like a great improvement and a buck of the "curse", allthough an injury-plagued and somewhat irrelevant season in reality. It isn't unreasonable to think that the combination of a high YPC and a lot of injuries in Y+1 is a common combination in the 370+-group. After all the backs are very talented thus the high YPC, but may be worn down from the overuse in year Y.

If I'm totally wrong, excuse me - I have very little experience with T-analyzis and only small knowledge of the finer parts of standard-deviation.
I'm not trolling at all, and for all I know the FO-analyzis might have the same issues.; Monday, February 16, 2009
Brian Burke said...: Danish-That's a pretty good point. The only example of what you suggest is Tyrrell Davis, the Broncos RB who destroyed his knee trying to make a tackle after an interception. He had a phenomenal >370 carry season only to have a horrible YPC average in the few games he played the following year prior to his injury.

He's actually the data point way down on the -2 YPC line in the last graph in the article--the strongest single case for the "Curse". But his injury was so traumatic it would have crippled any player, regardless of the number of carries the year before.

All the other cases in the data set had a fairly sizable number of carries.; Tuesday, February 17, 2009
Mark said...: I have a master degree in statisics and you're and idiot. Using a rb a lot, clearly quickens the end of his career. Unless he's named Eric Dickerson.; Wednesday, February 25, 2009