Advanced NFL Stats: February 2008

WIN PROBABILITY GRAPHS

Check out the Win Probability graphs and play-by-play of your favorite team's biggest comebacks and most exciting games since 2000. An explanation can be found here. Just select a year, a team, or 'any', and start clicking:

Or search for all the games for your favorite team:

Or browse the current season by week:

Feb 29, 2008

[+/-]

Going for It on Fourth Down

It's 4th down and goal from the 2-yard line in the first quarter. What would most coaches do? Easy, they'd kick the field goal, a virtually certain 3 points.

But a 4th and goal from the 2 is successful about 3 out of 7 times, assuring the same number of expected points, on average, as the field goal. Plus, if the attempt at a touchdown is unsuccessful the opponent is left with the ball on the 2 or even 1 yard line. And if the field goal is successful, the opponent returns a kickoff which leaves them usually around the 28-yard line. It should be obvious that on balance, going for the touchdown is the better decision.

That's the case made by economist David Romer, author of a 2005 paper called "Do Firms Maximize, Evidence from Professional Football." Romer's paper is an analysis of 4th down situations in the NFL. It is quite possibly the most definitive proof that coaches are too timid on 4th down. Romer's theory is that coaches don't try to maximize their team's chances of winning games as much as they maximize their job security.

Coaches know that if they follow conventional wisdom and kick--oh well, the players just didn't make it happen. But if they take a risk and lose, even if it is on balance the better decision, they'll be Monday morning quarterbacked to death. Or at least their job security will be put in question.

In case anyone doubts how much coaches are concerned about Monday morning criticism, just take their word for it. Down by 3 points very late in the 4th quarter against the winless and fatigued Dolphin defense, former Ravens coach Brian Billick chose to kick a field goal on 4th and goal from one foot from the end zone. The Dolphins went on to score a touchdown in overtime. Billick's explanation at his Monday press conference was, "Had we done that [gone for it] after what we had done to get down there and [not scored a touchdown], I can imagine what the critique would have been today about the play call." Billick, a nine-year veteran head coach and Super Bowl winner, was more concerned about criticism from Baltimore Sun columnists than the actual outcome of the game. He'd rather escape criticism than give his team the best chance to win.

Romer's paper considers data from 3 years of games. To avoid the complications of particular "end-game" scenarios with time expiring in the 2nd or 4th quarters, he considers only plays from the 1st quarter of games. So his recommendations should be considered a general baseline for the typical drive, and not a prescription for every situation.

Romer's bottom line is the graph below. The x-axis is field position, and the y-axis is the yards-to-go on 4th down. The solid line represents when it is advisable for a team to attempt the first down rather than kick. According to the analysis, it's almost always worth it to go for it with less than 4 yards to go. The recommendation peaks at 4th and 10 from an opponent's 33 yard-line.

Romer basically measures the expected value of the next score. Say it's 4th and 2 from the 35 yd line. He compares the value of attempting a field goal from the 35 with the point value of a 1st and 10 from the 33 (multiplied the probability of actually making the first down.) He also recognizes that a field goal isn't always worth 3 points, and a touchdown isn't always worth at least 6. The ensuing kickoff gives an expected point value to the opponent. There is a point value to having a 1st and 10 from one's own 25 yard line.

One weakness of the paper is that it dismisses the concept of risk as unimportant. Romer says that long-term point optimization should be the only goal, so coaches should always be risk neutral. But if the level of risk aversion were actually considered, we might find that coaches are more rational than he concludes.

But the paper makes a very strong case that coaches should go for it on 4th down far more often than they currently do. Job security for coaches seems to be the primary reason why they don't. At a meeting with some researchers making the case for more aggressive 4th down decision making, Bengals coach Marvin Lewis responded, "You guys might very well be right that we're calling something too conservative in that situation. But what you don’t understand is that if I make a call that's viewed to be controversial by the fans and by the owner, and I fail, I lose my job."

It would be great if a coach came along and rarely kicked. It would be gamble, but if Romer and others are right, chances are the coach would be successful. And the rest of the NFL would have to adapt. It might only take one brave coach.

Feb 24, 2008

[+/-]

"Expert" Predictions

Gregg Easterbrook of ESPN.com writes a yearly column poking fun at all the terrible predictions from the previous NFL season. Here is his latest--It's long but highly entertaining. Unfortunately, it also makes a pretty good case that people like me with complicated mathematical models for predicting games are wasting our time. And the "experts" out there are doing even worse.

Predictions are Usually Terrible

His best line is "Just before the season starts, every sports page and sports-news outlet offers season predictions -- and hopes you don't copy them down." Unfortunately for them, he does.

Easterbrook's examples of horrible predictions underscores the fact that pre-season NFL predictions are completely worthless. Before the 2007 season I made the same point by showing that guessing an 8-8 record for every team is just as or more accurate than the "best" pre-season expert predictions or even the Vegas consensus. (Pay no attention to my own predictions attempt last June before I realized how futile it is.)

Unlike Easterbrook, most of us don't write our predictions down. It's easy to forget how wrong we were and how overconfident we were. So many of us go on making bold predictions every year.

Proof I'm (Almost) Wasting My Time

The most interesting part of the column might be the "Isaacson-Tarbell Algorithm." It's a system suggested by two of Easterbrook's readers last summer for predicting individual games. Just pick the team with the better record, and if both teams have the same record, pick the home team. According to Easterbrook, the Isaacson-Tarbell system would have been correct 67% of the time, about the same as the consensus Vegas favorites. Although devilishly simple, it requires no fancy computer models or expert knowledge and it would have beaten almost every human "expert" with a newspaper column, tv show, or website.

(Actually, I'm going to give credit for inventing the algorithm to my then 6-year old son who is an avid football fan (wonder why?). He devised that very same system during the 2006 season in a contest with my regression model and his grandfather in a weekly pick 'em contest. I'm sure many young fans have followed the same principle over the years.)

The model I built was accurate about 71% of the time last year. Is the extra 4% accuracy (10 games) worth all the trouble? Probably not (for a sane person) but I'll keep doing it anyway. Actually, I think 4% is better than it sounds. Why? Well, a monkey could be 50% correct correct, and a monkey who understood home field advantage could be 57% correct. It's a matter of how far above 57% can a prediction system get?

And there are upsets. No system, human or computer-based, could predict 100% accurately. They can only identify the correct favorite. Sometimes the better team loses. From my own and others' research, it looks like the best model could only be right about 75-80% of the time. So the real challenge is now "how far above 57% and how close to 80% can a system get?" There's only 23 percentage points of range between zero predictive ability and perfect predictive ability. Within that range, 4% is quite significant.

Better Ways to Grade Predictions

Phil Birnbaum of the Sabremetric Research blog makes the point that experts should not be evaluated on straight-up predictions but on predictions against the spread. I'm not sure that's a good idea, and I think I have a better suggestion.

Phil's point is that there are very few games in which a true expert would have enough insight to correctly pick against the consensus. Therefore, there aren't enough games to distinguish the real experts from the pretenders. His solution is to always pick against the spread.

I don't agree. The actual final point difference of a game has as much to do with the random circumstances of "trash time" as with any true difference in team ability. A better alternative may be to have experts weight their confidence in each game as way to compare their true knowledge.

Consider a hypothetical example Phil Birnbaum cited about an .800 team facing a .300. The true .800 team vs. true .300 team match-up is actually fairly rare. As Phil has eloquently pointed out previously, the .800 team may just be a .600 team that's been a little lucky, and the .300 team could really be a .500 team that's been a little unlucky. There are many more "true" .500 and .600 teams than .300 and .800 teams, so this kind of match-up is far more common than you'd expect. And if the ".500" team has home field advantage, we're really talking about a near 50/50 match-up. Although the apparent "0.800" team may still be the true favorite, a good expert can recognize games like this and set his confidence levels appropriately.

Computer Models vs. "Experts"

Game predictions are especially difficult early in the season, before we really know which teams are good. Over the past 2 years of running a prediction model, I've noticed that math-based prediction models (that account for opponent strength) do better than expert predictions in about weeks 3-8. The math models are free of the pre-season bias about how good teams "should" be. Teams like the Ravens and Bears, which won 13 games in 2006, were favored in games by experts far more than their early performance in 2007 warranted. Unbiased computer models could see just how bad they really would turn out to be.

But later in the season, the human experts come around to realizing which teams are actually any good. The computer models and humans do about equally well at this point. Then when teams lose star players due to injury, the human experts can usually outdo the math models which have difficulty quantifying sudden discontinuities in performance.

And in the last couple weeks, when the best teams have sewn up playoff spots and rest their starters, or when the "prospect" 2nd string QB gets his chance to show what he can do for his 4-10 team, the human experts have a clear advantage. By the end of the season, the math models appear to do only slightly better than experts, but that's only really due to the particularities of NFL playoff seedings.

In Defense of Human Experts

Humans making predictions are often in contests with several others (like the ESPN experts). By picking the favorite in every game, you are guaranteed to come in first...over a several-year contest. But in a single-season contest, you'd be guaranteed to come in 2nd or 3rd to the guy that got a little lucky.

The best strategy is to selectively pick some upsets and hope to be that lucky guy. Plus, toward the end of the year, players that are several games behind are forced to aggressively pick more and more upsets hoping to catch up. Both of those factors have the effect of reducing the overall accuracy of the human experts. The comparison between math models and experts can often be unfair.

In Defense of Mathematical Predictions

Lastly, in defense of the computer models, the vast majority of them aren't done well and give them a bad name. There is an enormous amount of data available on NFL teams, and people tend to take the kitchen-sink approach to prediction models. I started out doing that myself. But if you can identify what part of team performance is repeatable skill and what is due to randomness particular to non-repeating circumstances, you can build a very accurate model. I'm learning as I go along, and my model is already beating just about everything else. So I'm confident it can be even better next season.

[+/-]

Fumbles, Penalties, and Home Field Advantage

I had a theory that part of home field advantage may come from fumble recovery rates. Specifically, I was thinking of the kind of fumble that results in a pile of humanity fighting for the ball by doing things to each other only elsewhere done in prisons. It seems that the officials often have no better way of determining possession than by guessing which player has more control of the ball than the other guy. Sometimes it seems like they have a system--pulling the players off the pile one by one until they can see the ball. But in the end, they're still relying on their own judgment. There are complicating factors. Where was the ball when the play was whistled dead? When was the original ball carrier down? Was it a fumble or incomplete pass? In many cases, the process is analogous to basketball referees determining possession of a "jump ball" by their judgment of which player has better grip, or which player ultimately ripped the ball loose.

Perhaps the influence of the crowd had an effect on the officials by biasing their judgment. It's plausible because their have been many academic studies documenting the psychological effect of a home crowd on officiating in several sports. Much of the research focuses on penalties and fouls called by the officials, but what about other matters of judgment? Fumble recoveries might shed some light.

If the fumble recovery rate of home teams is significantly greater than away teams, then we'd have evidence that NFL officials are favoring home teams. The table below lists home and visiting team's fumbles and fumbles lost from the entire 2007 regular season encompassing 256 games.

	Fumbles	Lost	Rate (%)
Visitor	409	189	46.2
Home	388	189	48.7

It appears that although visiting teams fumbled slightly more often, they lost possession less frequently. Neither difference is statistically significant, however, indicating that officials are unbiased in that department.

Although my fumble theory was a bust, what about penalties. Could the difference in penalties given to home and away teams be large enough to explain most of the home field advantage in the NFL? But if visiting teams in fact penalized more, it wouldn't necessarily indicate officiating bias. It could be due to crowd noise or other factors.

The table below lists The visitor and home penalty and penalty yard averages for the 2006 regular season.

	Penalties/G	Pen Yards/G
Visitor	6.2	50.1
Home	5.8	48.1

I was very surprised by how small the difference is. On average, visiting teams only have 0.4 more penalties called (and accepted) on them than home teams for a difference of only 2 yards. I would expect the difference to be greater because of false start and delay of game penalties due to crowd noise.

In 2006, home teams won 55.6% of regular season games. According to the in-game model at Football Prediction Network, the difference of 2 penalty yards can only account for about 0.9% of the 5.6% home field advantage.

It appears that neither fumble recoveries nor penalties account for much of home field advantage in the NFL. Other factors such as travel fatigue or motivation are likely to be much more important. So I came up empty handed in the research...or so I thought until I came across some gems at Referee Chat Blog when doing some background research.

The author tracks officiating data from week to week, crew by crew. One of the most interesting things he's found is that crews don't tend to consistently favor home teams more than visiting teams across seasons (correlation = -0.04). Contrary to what was found in the study of officiating in British Premier League soccer I linked to above, NFL officials do not indicate a susceptibility to home crowd influence.

Many of the author's conclusions are based on differences in very small sample sizes (and he seems to realize this), but the data there are sound. Rex definitely knows his refs.

Feb 23, 2008

[+/-]

More Spygate Revelations

Without a doubt, the most popular and controversial article here at NFL Stats was one from last fall titled "Belichick Cheating Evidence?" Since then, there have been more revelations of rule-breaking, including the most recent allegations that the Patriots have been illegally taping signals since the 2000 season. Count me as one guy who is not surprised.

Back on September 15th, shortly after the League blew the whistle on the Patriots' signal taping, I wrote:

If Belichick's Patriots exploited unfair advantages in stealing signs from opposing sidelines we would expect to see some sort of evidence that they won games "beyond their means." By means I am referring to the Patriots' passing and running performance on offense and defense.

By successfully exploiting stolen signs, we might expect the Patriots to choose to use that advantage on critical plays--3rd downs in the 4th quarter for example. These critical plays would heavily "leverage" performance on the field to be converted into wins. In other words, the Patriots would win more games than their on field stats would indicate.

This is exactly what we see in the data. Year-in and year-out, Belichick's Patriots have won about 2 more games than expected given their offensive and defensive efficiencies, including turnovers and penalties. No other modern team has even come close to the Patriots in consistently winning more games than their stats indicate.

My research was based on an explanatory regression model of team wins that considers offensive and defensive passing and running efficiencies, turnovers, and penalties. It was actually conducted before the first revelation of taping at the Jets game, so I was not looking for evidence of cheating.

The model estimated how many wins a team would be expected to have each year based on its on-field abilities. The Patriots had won about 2 more games per year, every year, from 2002-2006 than their on-field performance would statistically indicate. In other words, other teams with similar performance stats win 2 fewer games in a season than Belichick's Patriots did. The graph below illustrates this trend.

I don't claim that the statistical model is perfect, but the odds that one team would over-perform so consistently and so strongly are astronomical. No other team had a pattern remotely like New England's.

My hunch is that not all of the over-performance is due to advantages gained from rule-breaking. I think the Patriots are focused intently on every last detail. Their scouting and research efforts are probably second to none. A team like that would squeeze every last advantage they could from every situation, and the taping was probably part of that larger effort which was partly legal and partly illegal.

Feb 19, 2008

[+/-]

2007 FG Kicker Ranking

There aren't many positions in team sports as lonely as the place kicker. Alone on the sidelines all game long, he's asked to make the game-winning field goal in overtime to send his team to the Super Bowl. Or maybe his head coach doesn't have enough faith in his leg to attempt a 48 yard try, and instead goes for it on 4th and 13 only to ultimately lose the Super Bowl by 3 points.

When we grade field goal kickers, we need to account for attempt distance and other factors. And attempt distance is complicated--it's non-linear. A 40 yard attempt is not twice as difficult as a 20 yd attempt. In fact, here is a graph of the average accuracy rates for field goals of various attempt distances.

So based on the analysis described here, I calculated the expected FG percentage for every NFL kicker based on his average attempt distance and home stadium environment. The difference between his actual FG% and his expected FG% can be considered his true performance given the difficulty of his attempts.

The table below lists all FG kickers from 2007 who had at least 10 attempts. It's sorted from best to worst. Click on the headers to resort as desired.

Kicker	Team	Avg Yds Att	# Att	Actual FG%	Exp FG%	Act-Exp %
Feely		36.6	23	91	83	9
Bironas		36.7	39	90	81	9
Reed		34.1	25	92	84	8
Nedney		36.1	19	89	81	8
Graham		33.4	34	91	85	6
Longwell		41.2	24	83	77	6
Gould		37.9	36	86	80	6
Brown		38.3	29	86	80	6
Scobee		33.5	13	92	86	6
Lindell		35.0	27	89	83	6
Elam		36.4	31	87	81	6
Andersen		34.3	28	89	84	5
Kaeding		35.1	27	89	84	5
Kasay		35.9	28	86	82	4
Hanson		38.7	35	83	80	3
Dawson		34.1	30	87	84	3
Gostkowski		32.9	24	88	85	2
Brown		37.5	34	82	80	2
Suisham		36.8	35	83	81	2
Stover		35.2	32	84	83	2
Folk		36.1	31	84	82	1
Bryant		35.0	33	85	84	0
Tynes		33.5	27	85	85	0
Carney		32.9	14	86	85	0
Crosby		38.2	39	79	80	0
Nugent		35.5	36	81	82	-2
Janikowski		42.7	32	72	75	-3
Wilkins		38.7	32	75	80	-5
Akers		35.3	32	75	83	-8
Rackers		39.1	30	70	79	-9
Vinatieri		30.4	29	79	89	-10
Rayner		38.0	22	68	80	-11
Mare		34.7	19	53	84	-31

Congratulations to Jay Feely and the Dolphins, who at least have bragging rights to something in 2007. And jeez, what happened to Mare down in New Orleans? Only 19 attempts, but still that's significantly bad to say the least. Rackers seemed to have a down year. He was one of the top FG kickers over the last two years.

(Except for Mare) kickers are mostly bunched together in performance. Although according to raw accuracy percentage they appear separated by a wide disparity, in reality the difference in performance among field goal kickers is not large. In my previous analysis, I estimated that one stadard deviation in true accuracy is 7.7%. And for every standard deviation difference, a kicker would yield on average an additional 2.3 field goals worth 6.7 points in a season. Whether that's considered a lot or not depends on your perspective.

Feb 15, 2008

[+/-]

Coaches and Risk

Recently I've been looking at risk and reward in the NFL using financial portfolio theory, a branch of math that analyzes and optimizes various risk-reward strategies. I've been building on previous research that applied the utility function to analyze each team's run/pass balance. In the last post, I calculated what each team's risk level (α) was for the 2006 season.

Risk was calculated as a level of risk aversion (or tolerance) based on the relative expected yardage gains and volatility of a team's running plays and passing plays. This method considers not only the simple ratio between run plays and pass plays, but the variance of each as well. For example, it considers whether a passing game is a short, high-percentage game or an aggressive down field game. Positive α means a team was risk averse, and negative α means a team was risk tolerant.

But grading play callers as risk tolerant or averse is slightly more complicated. I noted that winning teams were often the more conservative teams, but that conservative play calling was likely the result of having a lead. In other words, winning leads to conservative play calling, not the other way around.

I also noticed a clearly linear relationship between team wins and risk level. Below is a graph of risk aversion vs. team wins. We can see that teams with a lot of wins generally are the teams that can afford to be conservative.

The upward sloping line is the regression best-fit line. It suggests the typical level of risk for each number of season wins. For example, a team with 12 wins should have a fairly conservative profile, an α of about 0.02. And a team with 8 wins should have been more aggressive, with an α of about 0.01.

The distance above or below the best-fit line could be considered the excess risk beyond that which is appropriate for each number of wins. This value is the "residual" of the regression. Note that I said "could be considered." Keep in mind that an 8-win team that appears "too risky" may really be a 6-win team that gambled often and got lucky. There is an unquantified part of the equation that is random luck.

Now we have a way to score teams and coaches as risk averse or risk tolerant. The table below ranks each coach in terms of his excess risk, from the most risky to the most conservative. (I excluded Atlanta from the analysis because they were severe outliers in 2006. Vick's boom and bust scrambling style defied convention. The Falcons appeared to be over 20 times more aggressive than the next riskiest team due to their relatively very high variance in their running game.)

Team	Coach	Wins	Risk	Excess Risk
	Fisher	8	-0.013	-0.0232
	Cowher	8	-0.011	-0.0206
	Shanahan	9	-0.001	-0.0137
	Smith	13	0.008	-0.0131
	Shell	2	-0.016	-0.0122
	Holmgren	9	0.002	-0.0101
	Coughlin	8	0.002	-0.0075
	Nolan	7	0.002	-0.0059
	Jauron	7	0.003	-0.0046
	Reed	10	0.011	-0.0040
	McCarthy	8	0.006	-0.0037
	Fox	8	0.007	-0.0034
	Billick	13	0.020	-0.0016
	Del Rio	8	0.009	-0.0013
	Gruden	4	0.000	-0.0006
	Mangini	10	0.016	0.0015
	Childress	6	0.008	0.0030
	Edwards	9	0.016	0.0039
	Shottenheimer	14	0.029	0.0049
	Kubiak	6	0.011	0.0057
	Saban	6	0.011	0.0059
	Crennel	4	0.008	0.0068
	Dungy	12	0.026	0.0071
	Parcells	9	0.019	0.0072
	Linehan	8	0.018	0.0077
	Gibbs	5	0.011	0.0078
	Lewis	8	0.019	0.0090
	Green	5	0.013	0.0094
	Marinelli	3	0.009	0.0107
	Payton	10	0.030	0.0151
	Belichick	12	0.039	0.0197
	Mora	7	-0.438	na

Another way of looking at excess risk is presented in the graph below. The teams are sorted from most to fewest wins. Click to expand it.

Notice how many below-average teams were risk averse. Oakland was the only team to display a high degree of risk tolerance. But this is likely due to their incredibly inconsistent passing game and inability to protect their quarterbacks.

Also notice how many teams that are considered "pass first" teams, such as IND, STL, CIN, and NO, show up on the risk averse side. They aren't considered too risk averse because they run too much, but because their passing games were so consistent. This result suggests they should have thrown even more, or thrown deeper to riskier routes more often.

Of course we really could be talking about offensive coordinators rather than head coaches. But with few exceptions, it's the head coach that really sets his team's overall strategy. We're also only looking at one year--because it's the one year of data I have. It would be really interesting to see if some coaches consistently show the same level of risk aversion or tolerance over several seasons. But to get the data requires a play-by-play NFL database, something not readily available...yet.

Feb 12, 2008

[+/-]

Belichick's Belichick

Feb 9, 2008

[+/-]

The Passing Paradox Part 3

This is a continuation of an analysis of run/pass balance in the NFL. In part 1 of this article, I discussed the potential application of financial portfolio theory in football strategy. In part 2, I critiqued a recent study that make a great stride toward applying economic and financial math to football. Here in the final part of this article, I present an alternative way of understanding risk and reward in the run/pass balance question.

To recap, two research papers came to opposite conclusions about the run/pass balance in the NFL. The Alamar paper "The Passing Premium" found that the expected gain for a pass is higher than the expected gain for a run, accounting for interceptions. He concluded that teams should pass more often. But the Rockerbie paper "Passing Premium Revisited" found the opposite, that teams pass too much. He applied an economic utility equation that accounts for risk and concluded that running more often helps teams win.

10 Yards for a First Down

Both papers simplify football into a yardage optimization game. Unlike financial investing where the goal is to maximize total return for certain acceptable levels of risk, football requires a minimum gain every 4 downs to maintain possession. At the end of each year no one takes most of your money away if your mutual funds don't earn at least 10%. If they did, and you hadn't made your 10% by November, your risk tolerance would dramatically increase for the final 2 months of the year.

And I think that's how we should model football. Every down and distance situation requires its own risk equation. On first and second down, teams can chose a balanced run/pass attack. But on 3rd down, risk tolerance needs to increase. The net effect would be to bias the offense towards the pass. Although not mathematically optimum in terms of total yardage gain, passing may be optimum when considering the added risk of having to punt.

Take a situation such as 3rd down and 5 yards to go. The table below is the cumulative distribution of yardage gained by running and passing. It lists the cumulative percent of each play that results in at least x yards gained. For example, a run play yields 5 yards or more 24.4% of the time, and a pass play yields 5 yards or more 45.7% of the time.

Yards Gained	Running	Passing
<0	87.1	93.2
0	79.4	59.2
1	69.2	57.6
2	54.6	55.8
3	41.9	53.5
4	31.9	49.9
5	24.4	45.7
6	19.2	41.2
7	15.3	37.1
8	12.2	33.3
9	9.0	28.7
10	7.9	27.3

So given the distribution above for a situation such as 3rd and 5, which type of play should be called? The chance of converting a first down by calling a pass is almost double than that for calling a run. The run is the better choice only in situations requiring gains less than 2 yards.

A coach can call plays with pure yardage optimization balance in mind until third down, commonly considered the do-or-die, make-or-break down. Then, he has to consider the risk of being forced to punt. The coach's decision is reduced to that single play and not an overall strategy. Because most 3rd down situations require more than 2 yards, the run/pass balance is biased toward the pass.

This is why play selection is a paradox. The worse an offense is at passing, the more often it needs to pass, and the higher its risk tolerance needs to be. Incomplete passes on either 1st or 2nd down typically lead to 3rd and long situations, requiring a pass. Teams with poor passing games would also tend to be behind towards the end of a game, which requires even more passing. Teams that don't pass well are therefore forced to play to their weakness. Thus, the passing paradox.

The inverse is also true. The better a team is at passing, the less often they need to do it. They would find themselves ahead in most games, allowing for a lower risk tolerance. Burning time off the clock by running the ball would be to their advantage.

Risk Aversion and Tolerance

In the "Revisited" paper, the author guessed at a perfect risk aversion coefficient for the NFL as a whole. He used the risk aversion (α) for the Chargers, because they had the best record in the year studied. Then he calculated what each teams' run/pass ratio should be based on that league-wide perfect α.

I explained the reasons why this was a bad idea in my last post, notably that poor teams (that tend to be behind) must increase their risk tolerance if they hope to overcome a significant deficit in a game, particularly towards the end of the game. Further, the analysis in the "Revisited" paper failed to recognize that it's winning that often leads to running, rather than the other way around.

So instead of choosing a perfect α based on a single team, then apply it to the entire NFL, why not calculate what each team's actual α was based on their actual run/pass balance? If I'm right about how winning leads to running, teams with a lot of wins should have a risk-averse portfolio, and teams with a lot of losses should have a risk-tolerant portfolio.

So that's what I did. The equation below solves for risk aversion in the maximized utility equation, instead of run/pass ratio as the author of "Revisited" did.

where:
α = risk aversion (negative values are risk tolerant, zero is neutral)
γ = % of plays that are runs
μ_R = mean (expected) gain of runs
μ_P = mean (expected) gain of passes
σ_R = standard deviation of run gains
σ_P = standard deviation of pass gains

The table below lists each team's running and passing stats (borrowed from "Revisited"), their actual play selection balance, their number of wins, and their calculated risk level according to the equation above. Keep in mind that positive α means risk aversion and negative α indicates risk tolerance. The list is sorted from most risk averse (conservative) offenses at top to the most risk tolerant (aggressive) at bottom. Click on the table headers to sort as desired.

Team	R Avg (μ_R)	P Avg (μ_P)	R SD (σ_R)	P SD (σ_P)	Actual (γ)	Wins	Risk (α)
NE	4.0	6.0	8.2	11.6	0.46	12	0.039
NO	3.1	7.3	9.0	15.7	0.39	10	0.030
SD	5.7	6.5	9.4	11.5	0.49	14	0.029
IND	4.5	6.6	5.9	12.4	0.45	12	0.026
BAL	3.8	5.2	6.2	12.2	0.45	13	0.020
DAL	4.5	6.3	5.8	13.7	0.49	9	0.019
CIN	4.1	6.5	6.0	15.0	0.43	8	0.019
STL	4.1	5.5	7.1	12.0	0.38	8	0.018
KC	4.5	5.6	7.0	13.5	0.52	9	0.016
NYJ	3.6	4.9	5.9	13.8	0.50	10	0.016
ARI	2.7	4.1	6.5	14.2	0.38	5	0.013
MIA	3.6	4.3	8.9	12.0	0.37	6	0.011
HOU	3.6	4.1	7.7	10.9	0.40	6	0.011
WAS	4.8	5.5	6.2	12.8	0.49	5	0.011
PHI	4.9	6.1	9.5	15.6	0.39	10	0.011
DET	3.9	4.8	9.1	13.0	0.32	3	0.009
JAX	4.8	5.2	6.9	11.9	0.51	8	0.009
MIN	3.9	4.4	8.9	12.7	0.42	6	0.008
CHI	3.9	4.8	5.3	15.1	0.48	13	0.008
CLE	2.7	3.4	8.0	13.8	0.40	4	0.008
CAR	4.2	4.7	7.6	13.1	0.42	8	0.007
GB	4.2	4.8	8.1	13.9	0.38	8	0.006
BUF	3.8	4.2	6.0	15.1	0.46	7	0.003
NYG	4.9	5.0	8.0	12.3	0.45	8	0.002
SEA	3.8	4.0	6.7	13.9	0.43	9	0.002
SF	4.4	4.5	10.9	14.2	0.48	7	0.002
TB	3.6	3.6	7.7	11.9	0.39	4	0.000
DEN	4.8	4.7	8.4	13.5	0.44	9	-0.001
PIT	4.7	3.6	9.5	14.7	0.36	8	-0.011
TEN	4.3	3.7	8.6	12.7	0.48	8	-0.013
OAK	3.6	2.5	8.4	13.7	0.44	2	-0.016
ATL	5.7	4.2	10.5	12.4	0.54	7	-0.438

Notice that most teams are very close to neutral risk (α = 0) but with one very large exception. Michael Vick's Falcons appear to be the biggest risk takers by far, with an eye-popping α = -0.438. But I think that result is due to the unique nature of Vick's offense. His runs were very boom and bust, with either a big gain or deep sack. Those were often called pass plays in which Vick scrambled. Plus, his running ability on the outside often opened up running holes for conventional run plays on the inside.

Most other teams, however, tilted slightly positive, meaning they were slightly risk averse. The teams with the most wins tended to be the teams that were most risk averse. Teams such as NE, NO, SD, IND, and BAL top the list of the most conservative offenses. They were also the best teams of 2006, with one team missing.

The NFC champion Bears managed 13 wins with a relatively risky offensive balance. This is due to their boom and bust passing game (μ_P = 4.8, σ_P=15.1). This result suggests that in 2006 CHI rolled the dice often with deep pass plays and got lucky. 2007 wasn't so kind to them.

One interesting application of this kind of risk analysis would be to repeat these calculations for multiple years to see which coaches and/or coordinators really are the most conservative and who are the biggest gamblers. I was surprised to see Belichick as coach of the most risk averse team. He does have a reputation for running on 3rd and short more often than other teams, so perhaps that explains NE's placement on top of the list.

Below is a graph of risk aversion vs. team wins. We can see that teams with a lot of wins generally are the teams that can afford to be conservative.

One possible application of this graph is to measure the vertical distance between the best-fit line and each team's risk aversion score. This distance is the regression "residual" accounting for wins and losses. It basically says how risk averse/tolerant a team was accounting for its wins. A multi-variate regression would be even better, accounting for both team defensive ability and wins. And instead of wins, we could use "4th quarter leads," which would be what really drives deviations from optimum risk tolerance. This analysis has the potential to be a good measure of how well a coach understands the game and his team--not just their running ability and passing ability, but their defensive ability as well.

There is tremendous potential for the application of portfolio theory in football. The "10 yards in 4 downs rule" complicates the analysis, however. Accordingly, each type of down and distance situation may require its own analysis. Plus, play-calling is not a simple pass or run binary decision. There are draws, screens, outs, hitches, flares, and all sorts of other unique plays. The risks and benefits of each type of play also require their own analysis.

To me, this is exciting because football may finally have a way of matching the depth of mathematical analysis pioneered by our sabremetrician friends in baseball. Baseball is simpler in many ways--run production is generally linearly additive and there are very few options for a team or player to increase or decrease risk as the situation requires. Unlike most other sports, risk in football is dynamic. Perhaps that's what makes it so exciting.

Feb 6, 2008

[+/-]

The Passing Paradox Part 2

This is a continuation of an analysis of run/pass balance in the NFL. In part 1 of this article, I discussed the potential application of financial portfolio theory in football strategy. In part 2 of this article, I critique a recent study that made a great stride in this effort.

Commenter JG referred us to a very interesting research paper by economist Duane Rockerbie called "The Passing Premium Revisited." The author applies portfolio theory to re-examine the run-pass balance in the NFL. He finds that teams pass too often. I think his approach is brilliant, but unfortunately his methodology has flaws similar to Alamar's original Passing Premium paper and his conclusions misinterpret his results.

In "Revisited" the author applies a version of the utility function (below) to find the optimal run/pass selection of all 32 NFL teams for the 2006 season. The optimal run/pass ratio is found by taking the derivative of the utility function and setting it equal to zero, thereby finding the curve's maximum. Each team's optimum run/pass ratio is based on the relative strength and variance of their running and passing games.
(The equation basically says that utility of a strategy (X) is a diminishing function of risk aversion/tolerance (alpha) and the expected return of the strategy (v).)

The author finds that most teams do not run as much as they should, and calls the difference between the optimum and actual run/pass ratio "run inefficiency." Run inefficiency is found to be linearly and convincingly correlated with losing. In other words, teams that run as much as they should won more than teams that passed too often. This would be very strong evidence that the run is underused in the NFL, and that the author has discovered a method for instructing coaches how often to run.

First, the computation of expected run yards and expected pass yards leave out some considerations. Like Alamar, the author assigns a -45 ard assessment for each interception. But also like Alamar, he appears to leave out sacks. Sack yards should count against the average pass, and each sack should count as a pass attempt--although the ball was not thrown, a pass play was called. The effect would be bias in favor of the pass. It's also not clear if he factored in additional yardage bonuses on touchdown plays. He doesn't mention it, so I would think not. Since more TDs are from passes than runs, the effect would be bias against the pass.

Also, quarterback scrambles should count as pass yards, not as run yards. They are the result of pass plays, just as sacks are not negative run plays. This might have a large effect on the stats of teams such as Michael Vick's Falcons or Vince Young's Titans in 2006.

But the author goes a step further, and better, than Alamar by excluding kneel downs and clock-stopping spikes from the data. He also factors in penalty yards, which may be important. If passes tended to result in interference calls against the defense, that would make passing look more attractive.

"But the most important factor in a team's risk tolerance may be its defense. With a very strong defense, a team's risk tolerance should be low."

The data is used to calculate the expected (average) result and standard deviation of the run and pass for all 32 teams. From this, he uses the utility function above to estimate each team's optimum ratio of running and passing. But first, the author needs to select the optimum risk tolerance (alpha in the equation above). The optimum risk tolerance is chosen by stipulating that the Chargers' run/pass balance is optimum because they had the best record at 14-2 in 2006, the year from which the data was taken.

I believe this is an error. Recall that the Chargers went 14-2 largely on the back of LaDanian Tomlinson, who led San Diego to an epic 5.7 yds per carry average, and on their #2 ranked defense led by rookie sack leader Shawne Merriman. The author seems unaware of the "running causes winning" fallacy in which teams appear to win because the chose to run more often. In reality, teams that are ahead late in a game, and already very likely to win, chose to run almost exclusively because it is less risky and it burns time off the clock.

The Chargers won 14 of 16 games, presumably leading in almost all of them when they could feed the ball to their talented running back in the 4th quarter. By selecting a 14-win team as the "perfect alpha" team, the author guarantees that any other team that runs less often (accounting for relative strengths of their running and passing abilities) will appear to run less often than they "should."

In fact, I don't believe there is a single uniform alpha for the entire NFL. It changes from down to down and situation to situation. If my team is down by 4 with 2 minutes remaining my alpha would be very negative (very high risk tolerance).

But the most important factor in a team's risk tolerance may be its defense. With a very strong defense, a team's risk tolerance should be low. With a weak defense, a team will likely to need to take additional risks to keep up with its opponent's easy scoring. This leads to my final point.

The author admits that although the selection of the optimum risk tolerance is arbitrary, the correlation of running a lot (accounting for relative strengths) and winning is still strong evidence supporting his utility-maximization analysis. The graph below shows all 32 teams' win totals for the 2006 regular season vs. run inefficiency. The lower the run inefficiency, the more wins a team tends to have.

What we see is that teams that run more often than their risk-reward utility indicate are teams that win. I suspect the direction of causation is that winning allows the running, not the reverse as the author implies.

The next chart is offensive points scored vs run inefficiency. There is a moderate and significant correlation, similar to team wins.

The final chart is points allowed vs. run inefficiency. It's clear that defensive ability explains much of each team's run inefficiency, i.e. run-pass imbalance.

Every team appears to have its own baseline alpha (risk tolerance) based on its defensive strength. Then, based on their respective running and passing abilities, they have an optimum run/pass ratio. But as game situations change in terms of leads, time remaining, and other situational variables, a team's risk tolerance should deviate from its baseline.

There isn't one optimum alpha for the NFL, and if it did it certainly should not be based on the 2006 Chargers. What we see in the charts is that most NFL teams roughly run and pass about as often as they should, given their respective defensive, running, and passing strengths. But some teams do not.

But there's one more wrinkle--football isn't a simple game of yardage optimization. It's complicated by its first down rules. In the final part of this article, I'll examine how this requirement affects play selection, and why I call this concept the "passing paradox."

Continue reading part 3 of The Passing Paradox.

[+/-]

The Passing Paradox Part 1

Recently, I've been examining the "passing premium," the difference in expected gain between a pass play and a run play. After correcting some of the flaws in this paper, it appears that passing yields a better average gain than running, even after accounting for incompletions, sacks, and interceptions. This would suggest that NFL teams should pass more than they currently do because balance may indicate optimization.

"We can think of passing and running as two investments, each with its own expected payoff and volatility. "

Unfortunately, the optimum run and pass mix is more complicated than comparing average expected gains. Commenter "JG" pointed out that passing's comparatively high variance (they are often incomplete, or result in sacks or turnovers) means that passing should have a higher expected payoff. It would not be worthwhile if it didn't. If a team could get the same expected gain by only running, why would it ever risk a pass?

This kind of analysis is based on financial portfolio theory, a branch of math that analyzes and weighs risks and rewards. We can think of passing and running as two investments, each with its own expected payoff and volatility. When a team calls a running play it invests in a run at the price of 1 down, hoping for a payoff in yards. Running would be like buying a share of GE. Passing would be more like buying a share of a tech startup. There is more upside for rapid gain, but there is also a decent chance you'll lose the kids' college fund.

The author of this site proposes several possible applications of the Sharpe Ratio in football. The Sharpe Ratio is a financial measure of expected returns per unit of variability. Specifically, it is the ratio of average returns of an investment over a risk-free alternative to the standard deviation of the investment's value.

By comparing the Sharpe Ratio of running and passing, we can see if there is a premium of one tactic over the other accounting for each tactic's risk. We could also compare two different passing strategies, a high risk/high reward passing offense or a high percentage "dink and dunk" offense.

Consider a simple fictitious example below. Team A is the high-risk/reward passing team and Team B is the higher percentage passing team. The table lists the results of several pass attempts of each team (order is not important in the Sharpe Ratio). Both teams average the same number of yards per attempt. Team A had more incompletions and sacks, yet had more yards per completion. For the zero-risk alternative, I'll use a zero-yard "QB flop" play. Each team had one interception, a -45 yard equivalent.

Pass	Team A	Team B
Pass 1	40	27
Pass 2	22	18
Pass 3	20	18
Pass 4	15	13
Pass 5	15	13
Pass 6	10	13
Pass 7	3	10
Pass 8	0	5
Pass 9	0	3
Pass 10	0	0
Pass 11	0	0
Pass 12	-5	0
Pass 13	-5	-5
Pass 14	-10	-10
Pass 15	-45	-45
Avg YPA	4.00	4.00
Std Dev	18.86	16.75
Sharpe Ratio	0.11	0.12

In this example, the Sharpe Ratio is higher for Team B's high percentage offense, suggesting its rewards are more worth its risks. We would get similar results for any comparison of higher-risk tactics vs. low risk tactics, assuming the average net gain is equal.

The potential for the application of the Sharpe Ratio and all of Portfolio Theory in football strategy is vast. We might finally answer the question of whether a boom/bust running back like Barry Sanders is better than a straight-ahead pounder like Jamal Lewis. We could analyze the merits of Mike Martz's high risk/reward passing doctrine. I'm sure I'll be pursuing such applications in future research. In the meantime, however, the next post will critique a very interesting research paper that makes great strides in applying portfolio theory directly to the passing premium issue.

Continue reading part 2 of The Passing Paradox.

Feb 3, 2008

[+/-]

Super Bowl XLII and Team Possessions

First of all, that was an amazing game, possibly the most entertaining Super Bowl ever. Eli Manning played a great game, but the big story to me is how the Giants defense was able to hold the Patriots to only 14 points.

New York's pass rush obviously had a lot to do with their success. Their secondary played amazingly well too. But one of the biggest factors will probably go unmentioned in all the conventional post-game analysis--the clock.

The research from this article suggested that a heavy underdog could see its chance of winning significantly increase when the number of possessions for each team is reduced. The more possessions for each team, the more likely the better team will eventually come out on top. The fewer the possessions, the more likely that luck or other factors can conspire to create opportunities for the underdog to win. Perhaps the easiest way to think of it is that fewer possessions probably means a lower score, and a single drive or play can cause an upset.

In Super Bowl XLII, each team only had 8 full possessions. (This does not count the Giants 10 sec possession at the end of the 2nd quarter and their 1 sec possession at the end of the game.) Most games feature 10 to 13, the average being 11.5 full drives per game. The Patriot's final possession began with 35 seconds remaining, allowing time for only 3 desperation throws and a sack.

The table below illustrates each team's chance of winning a game with the given number of possessions based on a simulation of each team's historic scoring per possession rates. My original table did not even consider the possibility of 8, but I include it here.

Possessions	NE Wins	NYG Wins	Overtime
8	69.8	24.1	6.1
9	71.4	23.5	5.1
10	72.7	22.4	4.9
11	74.0	21.5	4.5
12	75.5	20.7	3.8
13	76.5	20.0	3.5

How did the game yield so few possessions? Long drives with high 3rd down conversion percentages appears to be the biggest reason. The Giants started the game with an amazingly long 10 minute drive culminating in a field goal. The Patriots didn't even finish their first drive until the second quarter. The Patriots started the third quarter with a drive of over 8 minutes resulting in a turnover on downs.

If each team had 2 or 3 more possessions, New England may well have been able to overcome their 3-point deficit. It's hard to say that with any certainty because the Giants slightly outplayed the Patriots almost all night. Congratulations to the Giants and their fans.

[+/-]

Super Bowl XLII Prediction

NE Prob	Super Bowl XLII	NYG Prob
0.76	NE vs. NYG	0.24

[+/-]

More Patriots Cheating Allegations

Yesterday (the Saturday before Super Bowl XLII) the Boston Herald reported that a source alleged "a member of the [Patriots] video department filmed the Rams’ final walkthrough" before the 2002 Super Bowl in which the Patriots upset the heavily favored Rams. ESPN reported that the Rams stated the walkthrough primarily focused on plays they intended to run in the red zone.

Asked about the allegations yesterday, commissioner Robert Goodell answered, "I’m not aware of that.” NFL Spokesman Greg Aiello added, "We have no information on that."

Then later, on the same or very next day, spokesman Aiello told the AP, "We were aware of the rumor months ago and looked into it. There was no evidence of it on the tapes or in the notes produced by the Patriots, and the Patriots told us it was not true." (emphasis mine)

Well, that clears that up. Imagine if the movie Untouchables ended this way: Elliot Ness--"Your honor, there is no evidence of tax evasion in the documents provided by Mr. Capone, and he has told us the charges are not true." Judge--"Case dismissed."

Previous Research

I don't normally add my own opinions here at this site, but I'll make an exception here because of my previous look into possible statistical evidence that the Patriots benefited from unfair advantages. Specifically, the Patriots had won about 2 more games per year, every year, from 2002-2006 than their on-field performance would statistically indicate. In other words, other teams with similar performance stats win 2 fewer games in a season than Belichick's Patriots did. If the Patriots used knowledge of their opponents' play calls in primarily high-leverage situations (3rd downs or critical 4th quarter plays) we would see this kind of result.

My own interest in statistics began when I did my masters thesis, a research paper on midshipmen at the U.S. Naval Academy who violated its Honor Concept. It was basically research on cheaters at an elite institution in highly competitive and stressful environment. Although not the primary focus of my research, along the way I learned that cheaters are recidivists. Once they are able to rationalize their behavior, they will continue to cheat. Additionally, those who are caught are rarely nabbed on their first attempt, most likely because they select methods and opportunities hard to detect. They go out of their way to hide their cheating activity. No surprise there.

(Coincidentally, it was at Annapolis where Belichick learned his football under his dad, an assistant coach for Navy.)

So when we saw first hand how the Patriots violated league rules, I surmised it was highly unlikely that their activities were limited to taping defensive signals in isolated games. I think it would be naive to believe otherwise. Several very smart commenters (with very good points) accused me of making "assumptions" about the Patriots' cheating. So if the recent allegations have any merit, I'd feel somewhat vindicated.

Super Bowl XXXVI

Then, just as I was thinking of writing this post this afternoon, I channel-surfed onto the NFL Films highlights of the Patriots-Rams 2002 Super Bowl on ESPN2. Immediately prior to the drive in which Ty Law jumped a quick out route to intercept a Kurt Warner pass and return it for a touchdown, there was a sideline shot of three Patriot defenders discussing signals. (My thanks to TiVo, by the way.)

In the shot, Safety Lawyer Malloy runs up to cornerbacks Terrel Buckley and Terrance Shaw and says,
"Listen! Listen!
We got 'Sloop.' (makes a hand signal)
We got 'Move'...you know the move signal. (makes a different signal)
We got 'Marine 5'...'Marine.' (makes signal)
We got 'Seagull.' (another signal)
We got, this is...this is 'Double Out' right here. (making signal)"
Buckley and Shaw mimic the signals and nod each time.

My guess is these would not be their own signals--they would know them already and Milloy's words "we got" and "you know" plus the names for each signal suggest the signals are somewhat but not entirely new. Additionally, "double out" sounds like an offensive call. They appear to be rehearsing the Rams' signals, although possibly Warner's QB signals and not sideline signals. But the main point is that knowledge of some of the Rams' offensive signals was widespread on the Patriots defense, it was a priority to them, and it apparently didn't hurt New England's performance.

On the other hand, I'd guess that all teams try to read QB signals, but if they could do it reliably well, offenses wouldn't use them. Offenses would also be able to use countermeasures or easily spoof a defense. The Patriots may just play this part of the game better than other teams. If so, we'd see the same statistical results I found in my earlier post. It also underscores that outsiders like myself really don't have a any idea of what really goes on inside the film rooms and coordinator booths in the NFL.

Belichick's Focus

Perhaps Belichick had an intense focus, within the rules, on exploiting opponent's signals and deceiving them with his own. In the military we call this 'SigInt' for signals intelligence, a critically important part of modern warfare. The advantage from signal exploitation may have encouraged the Patriots to pursue it beyond permitted means. Jets coach and former Belichick assistant Eric Mangini would have been aware of the importance of this part of the game to the Patriots, so It's no surprise he was the one to blow the whistle.

A Real Investigation

The other point is that if the NFL really wanted to investigate these things, there is ample evidence in the NFL Films archive. There are probably hours upon hours of sideline film from just the Patriots' Super Bowls alone, not to mention playoff games or regular season games. An honest investigation would have taken weeks, not the couple of days the NFL took before destroying the evidence.

Don't get me wrong. I'm not a Belichick hater. I appreciate his cerebral approach to the game. I like how he goes for it on 4th down and focuses intensely on details, and I don't find him arrogant at all. But I do have a strong intolerance for cheating, and I believe these things deserve to be investigated.

WIN PROBABILITY GRAPHS

Feb 29, 2008

Going for It on Fourth Down

Feb 24, 2008

"Expert" Predictions

Fumbles, Penalties, and Home Field Advantage

Feb 23, 2008

More Spygate Revelations

Feb 19, 2008

2007 FG Kicker Ranking

Feb 15, 2008

Coaches and Risk

Feb 12, 2008

Belichick's Belichick

Feb 9, 2008

The Passing Paradox Part 3

Feb 6, 2008

The Passing Paradox Part 2

The Passing Paradox Part 1

Feb 3, 2008

Super Bowl XLII and Team Possessions

Super Bowl XLII Prediction

More Patriots Cheating Allegations

In-Game Win Probability

Search Adv NFL Stats

Top Articles

Support Military Families

Adv NFL Stats Archive

Other Great Sites

Advanced NFL Stats Community

WIN PROBABILITY GRAPHS

Feb 29, 2008

Feb 24, 2008

Feb 23, 2008

Feb 19, 2008

Feb 15, 2008

Feb 12, 2008

Feb 9, 2008

Feb 6, 2008

Feb 3, 2008

In-Game Win Probability

Search Adv NFL Stats

Subscribe

Top Articles

Support Military Families

Adv NFL Stats Archive

Other Great Sites

Advanced NFL Stats Community