Showing posts with label research. Show all posts
Showing posts with label research. Show all posts

Two-Point Conversion in the KC-DEN game

With 7:15 left in the 4th quarter against DEN, KC's Knile Davis ran for a 4-yard TD, narrowing the Broncos' lead 21-16 pending the extra point or two-point conversion. Andy Reid elected for the extra point, and following the kick the Chiefs trailed by 4 points rather than 3 or 5 points resulting from a two-point try.

NFL coaches typically adhere to what's known as the Vermeil Chart for making two-point decisions. The chart was created by Dick Vermeil when he was offensive coordinator for UCLA over 40 years ago. It's a very simple chart that simply looks at score difference prior to any conversion attempt and does not consider time remaining, with one caveat. It applies only when the coach expects there to be three or fewer (meaningful) possessions left in the game.

With just over 7 minutes to play, there could be three possessions at most left, especially considering that at least one of those possessions would need to be a KC scoring drive for any of this to matter. (In actuality, there were only two possessions left, one for each team.) Even the tried-and-true Vermeil chart says go for two when trailing by 5. But it's not the 1970s any more and this isn't college ball, so let's apply the numbers and create a better way of analyzing go-for-two decisions.

Except for rare exceptions I've resisted analyzing two-point conversion decisions with the Win Probability model because, as will become apparent, the analysis is particularly susceptible to noise. Now that we've got the new model, noise is extremely low, and I'm confident the model is more than up to the task.

First, let's walk through the possibilities for KC intuitively. If KC fails to score again or DEN gets a TD, none of this matters. Otherwise:

Nick Foles and Interception Index Regression

Nick Foles and Josh McCown were two of last season's most pleasant surprises, emerging from obscurity to post two of league's most efficient seasons.  Both finished in the top 3 for Expected Points Added per Play, largely in part because the two combined to throw just three interceptions.

With one week of the 2014 season in the books,  Foles and McCown have already matched that combined total.  While everyone should have expected both to regress from their remarkably turnover-free 2013 seasons, that does not tell us how far each should regress based on historical norms.

Simulating the Saints-Falcons Endgame

I was asked yesterday about the end of regulation of the Saints-Falcons game. With about a minute and a half remaining, NO was down by 4 but had a 1st & goal at the 1. With 2 timeouts left, should ATL have allowed the touchdown intentionally?

I previously examined intentional touchdown scenarios, but only considered situations when the offense was within 3 points. In this case NO needed a TD, which--needless to say--makes a big difference. Yet, because NO was on the 1, perhaps the go-ahead score was so likely that ATL would be better off down 3 with the ball than up 4 backed-up against their goal line.

This is a really, really hard analysis. There's a lot of what-ifs: What if NO scores on 1st down anyway? What if they don't score on 1st but on 2nd down? On 3rd down? On 4th down? Or what if they throw the ball? What if they stop the clock somehow, or commit a penalty? How likely is a turnover on each successive down? You can see that the situation quickly becomes an almost intractable problem without excessive assumptions.

That's where the WOPR comes in. The WOPR is the new game simulation model created this past off-season, designed and calibrated specifically for in-game analytics. It simulates a game from any starting point, play by play, yard by yard, and second by second. Play outcomes are randomly drawn from empirical distributions of actual plays that occurred in similar circumstances.

If you're not familiar with how simulation models work, you're probably wondering So what? Dude, I can put my Madden on auto-play and do the same thing. Who cares who wins a dumb make-believe game? 

Analyzing Replay Challenges

The new WP model allows some nifty new applications. One of the more notable improvements is the consideration of timeouts. That, together with enhanced accuracy and precision allow us to analyze replay challenge decisions. Here at AFA, we've tinkered with replay analysis before, and we've estimated the implicit value of a timeout based on how and when coaches challenge plays. But without a way to directly measure the value of a timeout the analysis was only an exercise.

Most challenges are now replay assistant challenges--the automatic reviews for all scores and turnovers, plus particular plays inside two minutes of each half. Still, there are plenty of opportunities for coaches to challenge a call each week.

The cost of a challenge is two-fold. First, the coach (probably) loses one of his two challenges for the game. (He can recover one if he wins both challenges in a game.) Second, an unsuccessful challenge results in a charged timeout. The value of the first cost would be very hard to estimate, but thankfully the event that a coach runs out of challenges AND needs to use a third is exceptionally rare. I can't find even a single example since the automatic replay rules went into effect.

So I'm going to set that consideration aside for now. In the future, I may try to put a value on it, particularly if a coach had already used one challenge. But even then it would be very small and would diminish to zero as the game progresses toward its final 2 minutes. In any case, all the coaches challenges from this week were first challenges, and none represented the final team timeout, so we're in safe waters for now.

Every replay situation is unique. We can't quantify the probability that a particular play will be overturned statistically, but we can determine the breakeven probability of success for a challenge to be worthwhile for any situation. If a coach believes the chance of overturning the call is above the breakeven level, he should challenge. Below the breakeven level, he should hold onto his red flag.

Sneak Peek at WP 2.0

I've just completed the development, validation, and testing of the next-generation Win Probability model. It took the better part of the past 6 months. Despite many heartaches and frustrating turns, I'm really thrilled with the results. But as excited as I am to have this new tool, I'm also somewhat humbled by how inadequate the original model is in some regards.

As a quick refresher the WP model tells us the chance that a team will win a game in progress as a function of the game state--score, time, down, distance...etc. Although it's certainly interesting to have a good idea of how likely your favorite team is to win, the model's usefulness goes far beyond that.

WP is the ultimate measure of utility in football. As Herm once reminded us all, You play to win the game! Hello!, and WP measures how close or far you are from that single-minded goal. Its elegance lies in its perfectly linear proportions. Having a 40% chance at winning is exactly twice as good as having a 20% chance at winning, and an 80% chance is twice as good as 40%. You get the idea.

That feature allows analysts to use the model as a decision support tool. Simply put, any decision can be assessed on the following basis: Do the thing that gives you the best chance of winning. That's hardly controversial. The tough part is figuring out what the relevant chances of winning are for the decision-maker's various options, and that's what the WP model does. Thankfully, once the model is created, only fifth grade arithmetic is required for some very practical applications of interest to team decision-makers and to fans alike.

Implications of a 33-Yard XP

The NFL is experimenting with a longer XP this preseason. XPs have become so automatic (close to 99.5%) that there no longer much rationale for including them in the game. The Competition Committee's experiment is to move the line of scrimmage of each XP to the 15-yard line, making the distance of each kick 33-yards.

Over the past five seasons, attempts from that distance are successful 91.5% of the time. That should put a bit of excitement and drama into XPs, especially late in close games, which is what the NFL wants. But it might also have another effect on the game.

Currently, two-point conversions are successful at just about half that rate, somewhere north of 45%. The actual rate is somewhat nebulous, because of how fakes and aborted kick attempts into two-point attempts are counted.

It's likely the NFL chose the 15-yd line for a reason. The success rates for kicks from that distance are approximately twice the success rate for a 2-point attempt, making the entire extra point process "risk-neutral." In other words, going for two gives teams have half the chance at twice the points.

Win Values for the NFL

Jimmy Graham's contract values him at about 0.9 wins per season. Here's how I came to that estimate.

In 2013 the combined 32 NFL teams chased 256 regular season wins and spent $3.92 billion on player salary along the way. In simple terms, that would make the value of a win about $15 million. Unfortunately, things aren't so simple. To estimate the true relationship between salary and winning, we need to focus on wins above replacement.

Think of replacement level as the "intercept" term or constant in a regression. As a simple example think of the relationship between Celsius and Fahrenheit. There is a perfectly linear relationship between the two scales. To convert from deg C to deg F, multiply the Celsius temperature by 9/5. That's the slope or coefficient of the relationship. But because the zero point on the Celsius scale is 32 on the Fahrenheit scale, we need to add 32 when converting. That's the intercept. 32 degrees F is like the replacement level temperature.

No matter how teams spend their available salary, they need to have 53 guys on their roster. At a bare minimum, they need to spend 53 * $min salary just to open the season. We can consider that amount analogous to the 32-degrees of Fahrenheit. For 2013, the minimum salaries ranged from $420k for rookies to $940k for 10-year veterans. To field a purely replacement level squad, a franchise could enlist nothing but rookies. But to add a bit of realism, let's throw in a good number of 1, 2, and 3-year veterans in the mix for a weighted average min salary of $500k per year. The league-wide total of potential replacement salary comes to:

Using Probabilistic Distributions to Quantify NFL Combine Performance

Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.

Jadeveon Clowney is thought of as a “once-in-a-decade” or even “once-in-a-generation” pass rushing talent by many. Once the top rated high school talent in the country, Clowney has retained that distinction through 3 years in college football’s most dominant conference. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. For all of the concerns over his work ethic, dedication, and professionalism, Clowney’s athleticism and potential have never been called into question. But is his athleticism actually that rare? And is his talent worth gambling millions of dollars and the 1st overall pick on? This article aims to objectify exactly how rare Jadeveon Clowney’s athleticism is in a historical sense.

Jadeveon Clowney set the NFL draft world on fire at this year’s combine when he delivered one of the most talked-about combine performances of recent memory, primarily driven by his blistering 40 yard dash time of 4.53. Over the years, however, I recall players like Vernon Gholston, Mario Williams, and even Ziggy Ansah displaying mind-boggling athleticism in drills. But if each year a player displays unseen athleticism at the combine, who is really impressive enough that we deem them “Once-in-a-decade?”

Probability Ranking allows me to identify the probability of encountering an athlete’s measurable. For instance, I probability ranked NFL combine 40 yard dash times for 341 defensive ends from 1999-2014 (Table 1 shows the top 50). In this case, Jadeveon Clowney’s 40 time of 4.53 had a probability rank of 99.12, meaning his speed is in the 99th percentile of all DEs over this time span.

NFL Prospect Evaluation using Quantile Regression

Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.

Extraordinary amounts of data go into evaluation an NFL prospect. The NFL combine, pro days, college statistics, game tape breakdown, and even personality tests can all play a role in predicting a player’s future in the NFL. Jadeveon Clowney is arguably the most discussed prospect in the 2014 NFL draft, not named Johnny Manziel. He is certainly an elite prospect and potentially the best in this year’s draft, but he doesn’t appear to be a “once-in-a-decade” type of physical specimen based exclusively on historical combine performances. From the research I’ve done, only Mario Williams and JJ Watt can make such a claim. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. All prospects have a “ceiling” and a “floor” which represent the maximum and lowest potential that a prospect could realize respectively. But what does this “potential” mean and does it hold any importance for actually predicting a prospect’s success in the NFL? In this article I will show how Quantile Regression, a technique used by quantitative ecologists, can clarify what Clowney’s proverbial “ceiling” and “floor” may be in the NFL.

Athletes are a collection of numerous measured and unmeasured descriptor variables. Figure 1 shows a single predictor (40 yard dash time) vs a prospects’ Career NFL sacks + tackles for loss (TFL) per game.

New Feature on the Draft Model

In my last job I worked with a team of software developers. The interfaces they designed didn't make much sense to me. The interfaces were always, at heart, a giant expanding tree of classes, objects, and properties. Huh? Lots of tiny plus and minus marks everywhere to expand and contract the accordion. Left click to view something. Right click to modify it. If you ever had to deal with the Windows registry, it was like that. Steve Jobs would not have been thrilled.

When I learned a little about object oriented programming, it all made sense. The software engineers were designing the interface for their own convenience, not for ease of use. It made sense from an efficiency standpoint...a programming efficiency standpoint. But from the perspective of the user, it wasn't so efficient. The least used feature was just as accessible as the most common feature, and all of them were hidden until you expanded the right portion of the tree.

Yesterday I realized I was doing the same thing with the draft model. From my point of view, it's easiest to think in terms of players and their probability to be selected at each pick number, because that's how the software that runs the model works. It goes down the list of prospects, player-by-player, looking at the probability he'll be selected pick#-by-pick#.

For the players and their agents, and for fans of particular players, this is ideal. They want to know where and when they'll go. But the user is probably thinking of things from a team's perspective. Whether the user is a team personnel guy or a fan of a team, he'd rather see things from the perspective of a pick #. Right now, a Vikings fan (or exec) would have to click through over a dozen or so of the top players to see who's likely to be available to them at pick #8. And if they were wondering about who'd be available if they trade up or down, that's another few dozen clicks. Scroll, click. Scroll, click...

Bayesian Draft Analysis Tool

This tool is intended to help decision-makers better assess the NFL draft market. Specifically, it estimates the probability each prospect will be available at each pick number.  The estimates are based on a Bayesian inference model based on consensus player rankings and projections from individual experts with a history of accuracy.

For details on how the model works, please refer to these write-ups:

 - A full description of the purpose and capabilities of the model
 - A discussion of the theoretical basis of Bayesian inference as applied to draft modeling
 - More details on the specific methodology

If you want to jump straight to the results, here they are. But I recommend reading a little further for a brief description of what you'll find.


The interface consists of a list of prospects and two primary charts. Selecting a prospect displays the probabilities of when he'll likely be taken. You can filter the selection list by overall ranking or position.

The top chart plots the probabilities the selected prospect will be taken at each pick #. I think this chart is pretty cool because it illustrates the Bayesian inference process. You can actually see the model 'learn' as it refines its estimates with the addition of each new projection. Where there is a firm consensus among experts, the probability distribution is tall and narrow, indicating high confidence. When there is disagreement, the distribution is low and wide, indicating low confidence.

The lower chart is the bottom line. It's the take-away. It depicts the cumulative probability that the selected prospect will remain available at each pick #. For example, currently there's an 82% chance safety HaHa Clinton-Nix is available at the #8 pick but only a 26% chance he's available at #14. A team with an eye on a specific player could use this information in deciding whether to trade up or down, and in understanding how far they'd need to trade.



Hovering your cursor over one of the bars on the chart provides some additional context, including which team has that pick and that team's primary needs (according to nfl.com).

The box in the upper right gives you the player's vitals - school, position, height, weight. The expert projections used as inputs to the model are also listed. Currently those include Kiper (ESPN), McShay (Scouts, Inc.), Pat Kirwan(CBS Sports), Daniel Jeremiah (former team scout, NFL Network), and Bucky Brooks (NFL Network). Experts were selected for their  reputation, historical accuracy, and independence--that is, they don't all parrot the same projections. Not every prospect has a projection from each expert.

Link to the tool.

Bayesian Draft Model: More Methodology

Boomer, when you think about a guy like Thomas Bayes you think high motor, long arms, quick off the snap. Huge upside in any 3-4 scheme. Gets leverage on those tricky probability theorems right off the block. Game 1 starter for 90% of the teams out there. Writes proofs all the way through the end of the whistle. Definitely like him in the late first, early second round...

The new Bayesian draft model is nearly ready for prime time. Before I launch the full tool publicly, I need to finish describing how it works. Previously, I described its purpose and general approach. And my most recent post described the theoretical underpinnings of Bayesian inference as applied to draft projections. This post will provide more detail on the model's empirical basis.

To review, the purpose of the model is to provide support for decisions. Teams considering trades need the best estimates possible about the likelihood of specific player availability at each pick number. Knowing player availability also plays an important role in deciding which positions to focus on in each round. Plus, it's fun for fans who follow the draft to see which prospects will likely be available to their teams. Hopefully, this tool sits at the intersection of Things helpful to teams and Things interesting to fans.

Since I went over the math in the previous post, I'll dig right into how the probability distributions that comprise the 'priors' and 'likelihoods' were derived.

I collected three sets of data from the last four drafts--best player rankings, expert draft projections (mock drafts), and actual draft selections. In a nutshell, to produce the prior distribution, I compared how close each player's  consensus 'best-player' ranking was to his actual selection. And to produce the likelihood distributions I compared how close each player's actual selection was to the experts' mock projections.

Theoretical Explanation of the Bayesian Draft Model

I recently introduced a model for estimating the probabilities of when prospects will be taken in the draft. This post will provide an overview of the principles that underpin it. A future post will go over some of the deeper details of how the inputs for the model were derived.

First, some terminology. P(A) means the "probability of event A," as in the probability it rains in Seattle tomorrow. Event A is 'it rains in Seattle tomorrow'. Likewise, we can define P(B) as the probability that it rains in Seattle today.

P(A|B) means "the probability of event A given event B occurs," as in the probability that it rains in Seattle tomorrow given that it rained there today. This is known as a conditional probability.

The probability it rains in Seattle today and tomorrow can be calculated by P(A|B) * P(B), which should be fairly intuitive. I hope I haven't lost anyone.

It's also intuitive that "raining in Seattle today and tomorrow" is equivalent to "raining in Seattle tomorrow and today." There's no difference at all between those two things, and so there's no difference in their probabilities.

We can write out that equivalence, like this:

Bayesian Draft Prediction Model

Let's say you're a GM in need of a safety. You really like Ha Ha Clinton-Dix (FS Ala.) but are unsure if he'll still be on the board when you're on the clock. Do you need to trade up? How far? What if you're a GM with a high pick and would be willing to trade down if you're still assured of getting Clinton-Dix? How far down could you trade and still get your guy?

I've created a tool for predicting when players will come off the board. This isn't a simple average of projections. Instead, it's a complete model based on the concept of Bayesian inference. Bayesian models have an uncanny knack for accurate projections if done properly. I won't go into the details of how Bayesian inference works in this post and save that for another article. This post is intended to illustrate the potential of this decision support tool.

Bayesian models begin with a 'prior' probability distribution, used as a reasonable first guess. Then that guess is refined as we add new information. It works the same way your brain does (hopefully). As more information is added, your prior belief is either confirmed or revised to some degree. The degree to which it is refined is a function of how reliable the new information is. This draft projection model works the same way.

Draft Prospect Evaluation Using Principal Component Analysis

A guest post by W. Casan Scott, Baylor University.

As different as ecology and the NFL sound, they share quite similar problems. The environment is an infinitely complex system with many known and unknown variables. The NFL is a perpetually changing landscape with a revolving door of players and schemes. Predicting an athlete’s performance pre-draft is complicated through a number of contributing variables including combine results, college production, intangibles, or how well that player fits a certain NFL scheme. Perhaps techniques that ecologists use to discern confounding trends in nature may be suitable for such challenges as the NFL draft. This article aims to introduce an eco-statistical tool, Principal Component Analysis (PCA), and its potential utility to advanced NFL analytics.

My Ph.D. research area is aquatic eco-toxicology, where I primarily model chemical exposure hazards to fish. So essentially, I use the best available data and methods to quantify how much danger a fish may be in, in a given habitat. Chemical exposures occur in infinitely complex mixtures across many different environments, and distinguishing trends from such dynamic situations is difficult.

Prospective draftees are actually similar (in theory) in that they are always a unique combination of their college team, inherent athleticism, history, intangibles, and even the current landscape in the NFL. The myriad of variables present in the environment and the NFL, both static and changing, make it difficult to separate the noise from actual, observable trends.

In environmental science, we sometimes use non-traditional methods to help us visualize what previously could not be observed. Likewise, Advanced NFL Analytics tries to answer questions that traditional methods cannot. The goal of this article is to educate others of the utility of eco-statistical tools, namely Principal Component Analysis (PCA), in assessing NFL draft prospects.

Wondering About the Wonderlic: Does It Predict Quarterback Performance?

By: Austin Tymins and Andrew Fraga
Published originally at Harvard Sports Analysis

During the 2014 NFL Draft, all 32 NFL teams will be on the clock to invest in the future of their franchises. Decision makers will feel immense pressure to secure a top-notch first round pick, find the next Tom Brady in the sixth round, and, most importantly, avoid selecting a bust. College stats, highlight reels, and NFL Combine results will all be evaluated. The draft, however, isn’t just about physical prowess; in addition to the 6 workouts at the NFL Combine, such as the 40-yard dash and bench press, draft prospects must also complete the Wonderlic Test, an examination designed to gauge mental aptitude.

Lacrosse Analytics

I'm a Baltimore guy, and aside from an affinity for steamed crabs and a regrettable taste for National Bohemian beer, the mid-Atlantic has given me an appreciation for the sport of lacrosse. To most North American sports fans, lacrosse must seem like some strange niche sport, like "jousting" or "baseball." But it's very entertaining and fun to watch. It's growing fast, particularly in the super-zips around DC where the ANS headquarters is.

For those not familiar with lacrosse, imagine hockey played on a football field but, you know, with cleats instead of skates. And instead of a flat puck and flat sticks, there's a round ball and the sticks have small netted pocket to carry said ball. And instead of 3 periods, which must be some sort of weird French-Canadian socialist metric system thing, there's an even 4 quarters of play in lacrosse, just like God intended. But pretty much everything else is the same as hockey--face offs, goaltending, penalties & power plays. Lacrosse players tend to have more teeth though.

Because players carry the ball in their sticks rather than push it around on ice, possession tends to be more permanent than hockey. Lacrosse belongs to a class of sports I think of as "flow" sports. Soccer, hockey, lacrosse, field hockey, and to some degree basketball qualify. They are characterized by unbroken and continuous play, a ball loosely possessed by one team, and netted goals at either end of the field (or court). There are many variants of the basic team/ball/goal sport--for those of us old enough to remember the Goodwill Games of the 1980s, we have the dystopic sport of motoball burned into our brains. And for those of us (un)fortunate enough to attend the US Naval Academy (or the NY State penitentiary system) there's field ball. The interesting thing about these sports is that they can all be modeled the same way.

So with lacrosse season underway, I thought I'd take a detour from football work and make my contribution to lacrosse analytics. I built a parametric win probability model for lacrosse based on score, time, and possession. Here's how often a team can expect to win based on neutral possession--when there's a loose ball or immediately upon a faceoff following a previous score:

When Coaches Use Timeouts

As I continue to work on the next generation WP model, I'm looking hard at how timeouts are used. Here are 2 charts that capture about as much information as can be squeezed into a graphic.

The charts need some explanation. They plot how many timeouts a team has left during the second half based on time and score. Each facet represents a score difference. For example the top left plot is for when the team with the ball is down by 21 points.  Each facet's horizontal axis represents game minutes remaining, from 30 to 0. The vertical axis is the average number of timeouts left. So as the half expires, teams obviously have fewer timeouts remaining.

The first chart shows the defense's number of timeouts left throughout the second half based on the offense's current lead. I realize that's a little confusing, but I always think of game state from the perspective of the offense. For example, the green facet titled "-7" is for a defense that's leading by 7. You can notice that defenses ahead naturally use fewer timeouts than those that trail, as indicated by comparison to the "7" facet in blue. (Click to enlarge.)

What I'm Working On

It's been almost 6 years since I introduced the win probability model. It's been useful, to say the least. But it's also been a prisoner of the decisions I made back in 2008, long before I realized just how much it could help analyze the game. Imagine a building that serves its purpose adequately, but came to be as the result of many unplanned additions and modifications. That's essentially the current WP model, an ungainly algorithm with layers upon layers of features added on top of the original fit. It works, but it's more complicated than it needs to be, which makes upkeep a big problem.

Despite last season's improvements, it's long past time for an overhaul. Adding the new overtime rules, team strength adjustments, and coin flip considerations were big steps forward, but ultimately they were just more additions to the house.

The problem is that I'm invested in an architecture that wasn't planned to be used as a decision analysis tool. It must have been in 2007 when I recall some tv announcer say that Brian Billick was 500-1 (or whatever) when the Ravens had a lead of 14 points or more. I immediately thought, isn't that due more to Chris McAllister than Brian Billick? And, by the way, what is the chance a team will win given a certain lead and time remaining? When can I relax when my home team is up by 10 points? 13 points? 17 points?

That was the only purpose behind the original model. It didn't need a lot of precision or features. But soon I realized that if it were improved sufficiently, it could be much more. So I added field position. And then I added better statistical smoothing. And then I added down and distance. Then I added more and more features, but they were always modifications and overlays to the underlying model, all the while being tied to decisions I made years ago when I just wanted to satisfy my curiosity.

So I'm creating an all new model. Here's what it will include:

Thomas Bayes Would Approve of Seattle's Defensive Tactics

The following is a guest article by Gary Montry, a professional applied mathematician. Editor's note: Gary uses net yardage as the measure of utility, and we might prefer something like EP or WP, I think the general point of the article stands, and its strength is in the construction and solution to the problem. It's also a great refresher on conditional probabilities and Bayes' theorem.   

Last week a WSJ article about the Seahawks' defensive backs claimed that they "obstruct and foul opposing receivers on practically every play."  I took a deeper look in to the numbers and found that as long as referees are reluctant to throw flags on the defense in pass coverage (as claimed in the article), holding the receiver is a very efficient defensive strategy despite the risk of being penalized.

The following is an analysis using the concepts of expected utility, expected cost, and bayesian statistics.

The reason defensive holding is an optimal strategy comes down to one word. Economics. The referee's reluctance to call penalties on the defensive secondary is analogous to a market inefficiency. The variance in talent on NFL rosters, coaching staffs, and front offices between the best and worst teams in the league is probably very small. Successful teams win within a small margin. Seattle has found a way to exploit a relaxation in marginal constraints within the way the game is called that their competitors have not, and turned it into a competitive advantage.

If you think about committing a penalty in the same way as committing a crime, the expected utility is essentially the same. The expected utility (EU) for defensive holding is (opponent loss of down due to incomplete pass - probability of being penalized x cost of penalty). In other words, EU is the benefit of an incomplete pass minus the cost of the penalty times the probability of getting caught.

NFL Overtime Modeled as a Markov Chain

by Ben Zauzmer. Ben is a junior majoring in Applied Math at Harvard University and is a member of the Harvard Sports Analysis Collective. This article was originally published at harvardsportsanalysis.org.

In 2012, the NFL created new overtime rules designed to make the game fairer. The league switched from a sudden death setup to an arrangement that allows both teams to have a chance at scoring, unless the first team to receive scores a touchdown. Even with this change, it would seem that a coach should still always elect to receive if he wins the coin toss at the start of overtime, since an opening touchdown drive wins the game.

However, earlier this year, for the first time under the new rules, a coach made exactly the opposite decision. Bill Belichick, the three-time Super Bowl-winning coach of the New England Patriots, made the gutsy call to kick at the start of overtime. Many considered the main factor behind this decision to be the heavy winds at Gillette Stadium (if a team defers the choice of kicking or receiving, it may choose which direction to face). However, kicking first may also give a team better field position on offense and may actually benefit teams with strong defenses.

To calculate which strategy coaches should prefer, we will model NFL overtime as a Markov Chain. We will define our states as the set of possible point differentials, from the perspective of the team that receives the opening kickoff, in overtime: -6, -3, -2, 0, 2, 3, 6. This model inherently assumes that state-to-state probabilities are not conditional, and that the probability of the score differential being 5 or 9 – both technically possible under the new rules – is negligible.

We will let be the transition matrix for the receiving team’s first offensive possession, be the receiving team’s first defensive possession, be every subsequent receiving team offensive drive, and be every subsequent receiving team defensive drive. The first row/column of each matrix represents the receiving team at a -6 scoring difference, and so on until the last row/column is the receiving team at a +6 scoring difference.

The matrices have the following forms:

Momentum Part 5 - Series Level Analysis

This is the final part of my series on momentum in a football game. Is momentum a causative property that a team can gain or lose, or is it only something our minds project to explain streaks of outcomes that don't alternate as much as we expect? It's been a couple months since I began this series, so as a refresher, here is what I've looked at so far:

Part 1 examined the possibility that momentum exists by measuring whether teams that obtain the ball in momentum-swinging ways go on to score more frequently than teams that obtained the ball by regular means.

Part 2 looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect.

Part 3 focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.

Part 4 applied a different method of examining momentum by using the runs test so see the degree to which team performance is streakier than random, independent trials.

In this part, I'll apply the runs test at the series level, to see if teams convert first downs (or fail to convert them) more consecutively than random independence would suggest. But first, I'll tie up some loose ends left hanging from part 4. Specifically, I'll redo the play-level runs test to eliminate potential confusion caused by a team with disparate performance from their offensive and defensive squads.

The Value of a Timeout - Part 2

In the first part of this article, I made a rough first approximation of the value of a timeout. Using a selected subsample of 2nd half situations, it appeared that a timeout's value was on the order of magnitude of .05 Win Probability (WP). In other words, if a team with 3 timeouts had a .70 WP, another identical team in the same situation but with only 2 timeouts would have about a .65 WP.

In this part, I'll apply a more rigorous analysis and get a better approximation. We'll also be able to repeat the methodology and build a generalized model of timeout values for any combination of score, time, and field position.

Methodology

For my purposes here, I used a logit regression. (Do not try to build a general WP model using logit regression. It won't work. The sport is too complex to capture the interactions properly.) Logit regression is suitable in this exercise because we're only going to look at regions of the game with fairly linear WP curves. I'm also only interested in the coefficient of the timeout variables, the relative values of timeout states, and not the full prediction of the model.

I specified the model with winning {0,1} as the outcome variable, and with yard line, score difference, time remaining, and timeouts for the offense and defense as predictors. The sample was restricted to 1st downs in the 3rd quarter near midfield, with the offense ahead by 0 to 7 points.

Results

The Value of a Timeout - A First Approximation

During the NFC Championship Game the other day, we saw a familiar situation. Down by 4 with 14 minutes left in the game, the Seahawks were confronted with a decision. It was 4th and 7 on the SF 37. Should they go for it, punt, or even try a long FG to maybe make it a 1-point game? Pete Carroll ended up making what was the right decision according to the numbers, but not before calling a timeout to think it over.

As I noted in my game commentary, if you need to call a timeout to think over your options, the situation is probably not far from the point of indifference where the options are nearly equal in value. And timeouts have significant value, particularly in situations like this example--late in the game and trailing by less than a TD--because you'll very likely need to stop the clock in the end-game, either to get the ball back or during a final offensive drive. Would Carroll have been better off making a quick but sub-optimum choice, rather than make the optimum choice but by burning a timeout along the way?

Here's another common situation. A team trails by one score in the third quarter. It's 3rd and 1 near midfield and the play clock is near zero. Instead of taking the delay of game penalty and facing a 3rd and 6, the head coach or QB calls a timeout. Was that the best choice, or would the team be better off facing 3rd and 6 but keeping all of its timeouts?

Both questions hinge on the value of a timeout, which has been something of a white whale of mine for a while. Knowing the value of a timeout would help coaches make better game management decisions, including clock management and replay challenges.

In this article, I'll estimate the value of a timeout by looking at how often teams win based on how many timeouts they have remaining. It's an exceptionally complex problem, so I'll simplify things by looking at a cross section of game situations--3rd quarter, one-score lead, first down at near midfield. First, I'll walk through a relatively crude but common-sense analysis, then I'll report the results of a more sophisticated method and see how both approaches compare.

Momentum 4: How Streaky Are NFL Games?

This is the 4th part in my series on examining the concept of momentum in NFL games.The first part looked at whether teams that gained possession of the ball by momentum-swinging means went on to score more frequently than teams that gained possession by regular means. The second part of this series looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect. The third part focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.

This article will examine how 'streaky' NFL games tend to be. If momentum is real and it affects game outcomes, it would result in streaks of success and failure that are longer than we would expect by chance. But if consecutive plays are independent of previous success, the streaks of success and failure will tend to be no longer than expected by chance. This method of analysis does not rely on any particular definition of a precipitating momentum-swing, as it looks at entire games to measure whether success begets further success and whether failure leads to more failure.

For momentum to have a tangible effect on games, it does not require completely unbroken strings of successful or unsuccessful plays. But if success does enhance the chance of subsequent success, then the streaks of outcomes will be longer than if by chance alone.

For this analysis, I applied the Runs Test to the sequence of plays in a game. This produces a statistic indicating how streaky a string of results is compared to what would be expected by chance. For example, consider the following 3 strings of results of flipping a coin 8 times:

HTHTHTHT, HHHHTTTT, HTTTHTHH

The Runs Test works like this:

What Kind of Teams Are Super Bowl Winners?

What's the profile of a Super Bowl winner in the modern era? Does defense win championships? Are they predominantly elite offenses? Do they have to be above average on both sides of the ball? Are champions always dominant in the regular season? Is your team out of the mix for the Lombardi Trophy?

Here's the plot of every team's regular season Expected Points Added (EPA) for every team from 1999-2013. The horizontal axis represents their offensive EPA per game, and the horizontal axis represents their defensive EPA per game. The best teams are in the upper-right quadrant, while the worst are in the lower-left. (Click to enlarge...it's suitable for framing!)

Dome at Cold Revisited

There were 4 dome teams playing in wintery weather today, and all four lost. DET lost at PHI, MIN lost at BAL, IND lost at CIN, and ATL lost at GB.

A few years ago I looked at how dome teams tend to struggle in cold weather. I wanted to know if dome team underperformance in the cold was indeed true and if so how big was the effect. The answers were: Yes and huge. Last season I redid things using actual game temperatures and found just as big an effect.

With the weather as it is on the east coast today, I looked at this phenomenon once again. We have two more years worth of data thanks to the addition of 1999. Plus, I was able to reconstruct nearly another half season worth of data by replacing missing game temperatures from the gamebooks. I also broke out teams that played in retractable-roof stadiums by season.

Here are the results if we count retractable teams as dome teams. There's a case to be made that retractable home environments are closer to dome environments than open air stadiums. The chart below plots road team winning percentage according to game temperature.

Saban's Hyperbola: Analyzing Alabama's Long FG Attempt

Way late to the party here, but let's do this because it's so interesting. As every football fan on the planet knows, Alabama attempted a 57-yd FG with 1 sec to play in regulation against Auburn. The kick fell short and was returned for a stunning game-winning TD. The consensus analysis seems to be that the FG attempt wasn't necessarily a bad decision, but the big mistake was that Alabama was not prepared with appropriate personnel to cover a potential return.

Let's look at the FG decision more closely. I won't use the WP model, but instead apply some math and logic. There were three options for Alabama:

1. Kneel
2. Hail Mary
3. Attempt the FG

Let's make some assumptions. First, OT is a 50/50 proposition. Alabama was favored in this game, but Auburn was playing strong. Plus, OT is a bit of a dice roll to begin with. Second, Hail Marys (Maries, Mary's?) from that range are probably no more successful in college than they are in the pros, which is around 5%. Lastly, for the sake of the argument, let's say there is zero chance of a defensive TD return on the Hail Mary.

We don't really know the probability of a successful FG attempt or the probability of a successful return or block & return from a range like that, especially in college ball from a kicker without many attempts. But let's set that aside for a moment.

Win Probability Model/Calculator Upgrades - Team Strength Adjustment & More

I just implemented several new features and significant upgrades to the Win Probability Calculator tool as well as the model behind it.

1. The biggest new feature is the capability to adjust the WP estimates based on relative team strength. This is accomplished by entering either a pregame WP estimate from the efficiency model, another source, or the game's point spread. The model has had this ability for a long time, but I didn't want to implement it until I had a sound way of doing so.

The prior pregame estimate of WP is revised as the game goes on with the baseline in-game WP estimate. The two probabilities are reconciled using the logit method. The trick is to understand how the pregame difference in team strength decays over the course of the game. At a certain point, it doesn't matter how much a team was favored if it's trailing by two or more scores late in the game. Team strength differential decays proportionally to the log of time as the game progresses according to a particular curve.

Pregame WP or spreads should be with respect to the current offense. For example, if the game's spread was -3 but the visitor has the ball in the scenario you are investigating, enter 3 for the spread. In what is a cool enough feature all by itself, entering a spread will automatically convert into a pregame WP estimate.

For the record, the WPA stats for teams and players will continue to use the baseline unadjusted WP numbers. If we used the adjusted WP numbers, every team  and player would have a zero WPA assuming our pregame estimates were accurate. Put simply, using the adjusted WPA stats would defeat their very purpose and only be a measure of how good our pregame forecasts were.

2. The next most significant update is the ability to account for receiving the kickoff in the 2nd half. This can have an effect of up to a 0.04 WP swing in the first half of close games. The input asks users for whether the team with possession kicked off to start the first half or not. This consideration doesn't apply following the 2nd half kickoff, so for 2nd half scenarios you can just leave the input at the default Don't Know.

Momentum Part 3: After Failed 4th Down Conversion Attempts

This is the third part of my look at momentum in the NFL. The first part examined whether several momentum-swinging types of events caused any increase in a team's chances of scoring on the subsequent possession. The second part compared the expected and observed Win Probability (WP) following momentum-swinging events to find out whether those events increased a team's chances of winning beyond what we would otherwise expect.

This installment cuts to the chase. From a strategic perspective, we want to understand how momentum may or may not affect the game so that coaches can make better decisions. Often, momentum is cited as a consideration to forgo strategically optimal choices for fear of losing the emotional and psychological edge thought to comprise momentum.

Here's the thinking: If a team tries to convert on 4th down but fails or unsuccessfully tries for a two-point conversion, it gives up the momentum to the other team. The implication is that failing on 4th down means that winning is now less probable than the resulting situation indicates, beyond what the numbers say. Therefore, the WP and Expected Points (EP) models used to estimate the values of the options no longer apply. In a nutshell, the analytic models underestimate the cost of failing.

[By the same token, the reverse argument should be just as valid. Wouldn't succeeding in a momentum-swinging play mean the chances of winning are even higher than the numbers indicate? For now, I'll set the 'upside' argument aside and examine only the 'downside' claim.]

Should You Kick a FG on 3rd Down?

With the game on the line, coaches are fond of attempting a FG on 3rd down in case of a bobbled snap. The rationale is that in case of a bad or bobbled snap, the holder can fall on the ball and the kick can be reattempted 8 yards deeper. Maybe that made sense in 1974, but I'm pretty sure it's a bad idea now.


Admittedly, I wrote that just based on my familiarity with the relevant numbers, so I thought I'd do the legwork. FG% improves with every yard closer a team gets. Every yard matters. In fact, every yards matters to the tune of 1.6% per yard when the line of scrimmage is between the 35-yard line and the 10-yard line.



Yesterday, Keith looked at this kind of situation in the context of the CHI-MIN game, and his results suggest the same conclusion. This post will examine play outcomes on 3rd down when the game is on the line and teams are in deep FG (attempt) range, and compare them to the likelihood of a bad snap or hold.

Fumble Rate by Temperature

The title says it all. With Sunday's night's fumble fest in the books, I thought I'd take a quick and dirty look at how cold temperature affects fumbling. It was 19 degrees in Foxboro when I checked the weather there in the 4th quarter.

I looked at all plays from 2000 through the 2012 regular season, excluding kneel downs and spikes. I counted all fumbles, not just fumbles lost. Keep in mind the sample sizes greatly diminish at the temperature extremes.

Here is the breakdown:

Momentum Part 2: The Effect of Momentum-Swinging Events on Game Outcomes

Recently I tried to detect the existence of momentum within an NFL game. I examined drive success based on how 'momentous' the manner in which the offense gained possession. Admittedly, that analysis only measures one aspect of momentum. In this post, I'll take the analysis a step further and look at how a team's chances of winning are affected following several momentum-swinging types of events. This approach examines the potential effect of momentum on the entire remaining part of a game, not just on the subsequent drive.

Like the previous analysis, I relied on how possession was obtained as an indication of a momentum-swing. For all drives from 1999-2013 ( through week 8), I compared a team's expected chances of winning (based on time, score, field position, down and distance) with how often that team actually won. I divided the data among three categories: possession obtained following a momentous play, possession obtained following a turnover on downs, and possession obtained following a non-momentous play.

Momentous obtainment includes fumble recoveries, interceptions, muffed punts, blocked kicks, and blocked field goals. I excluded missed field goals from the analysis because it was unclear to me how momentous they are. They are often thought of as big momentum changing events in close games but are too common (almost 20% of all kicks) to truly be momentous.

Is the Revolution Over? Did We Win?

"The Revolution Was Televised. The fourth down revolution is over. Going for it won."

Is Mike right? Did going for it really win? Mike makes a the case, and cites several promising examples of unconventional 4th down decisions from one Sunday afternoon earlier this season:

"-The Lions going for it on 4th-and-goal from the two-yard line, early in their win over the Cowboys.
-The Dolphins going for it on 4th-and-1 from the Patriots' 38-yard line, in the second quarter.
-The Patriots going for it on 4th-and-4 from the Dolphins' 34-yard line, while leading by three points in the fourth quarter.
-The Bengals going for it on 4th-and-inches from the 1-yard line, while leading 14-0 against the Jets.
-The Broncos scoring a 4th-and-goal touchdown to tie the game at 21 against the Redskins, in the third quarter.
-The Packers converting a 4th-and-3 from their own 42-yard line, setting up a touchdown to increase their lead to 31-17."

I think Mike is right to point out some very interesting cases where coaches are making some notable decisions, but the revolution is far from complete. I would suggest that an avalanche is the better analogy than revolution. One day there may be an avalanche of aggressive 4th down decisions, but right now we're only seeing a few rocks trickle down the mountainside. It's not that there haven't been bold examples of enlightenment. It's just that there are so many opportunities that coaches have spurned.

Momentum 1: Scoring Rates following 'Momentum-Swinging' Events

Momentum might be one of the most over-cited concepts in sports. It's an idea borrowed from physics, and is something we witness every day. We see it in rising tides, building storms, and boulders rolling downhill. But does such a concept apply to sports? Certainly, better teams will likely continue to prevail, and lesser teams will likely continue to lose. But that's not momentum. It's just better teams being better.

In this article, I'll explain why I think we see momentum when it's not really there. And to test the existence of momentum within NFL games, I'll compare the results of drives following 'momentum-swinging' events with those following non-momentum-swinging events.

For momentum to be a real thing in sports, it needs to have some connection to reality beyond the metaphysical and metaphorical. The theory is that good outcomes are emotionally uplifting, which in turn leads to better performance, which then feeds upon itself. It's understandable to believe in game momentum when we see games like this each week:

End of Half Clock Management

I'm watching a game right now where there's over a minute left in the 2nd quarter. The ball is at midfield and it's 4th and long. Both teams have all three timeouts, but neither team used any. The punting team is standing around letting the seconds tick away, while the receiving team is patiently waiting for the snap.

I think this is irrational. Football is a zero-sum game. Whatever is good for me is equally bad for you, and vice versa. So if stopping the clock right now is not what you want, then it must be what I want. It can't be possible for both teams to benefit from allowing the clock to run down. One or the other team derives an advantage, however small, from stopping the clock.

The only plausible exception I can think of is when the possibility of either team scoring is so remote that the cost of potential for injury on the remaining plays exceeds the value of whatever advantage could be squeezed from trying to pursue a score. In this sense, the game becomes non-zero-sum.

But I think it's more likely that one or both of the teams are excessively pessimistic. The punting team is worried that the receiving team might have enough time to put together a scoring drive, and the receiving team is worried they might turn the ball over or be forced to punt again from deep in its own territory.

Do Comeback Wins Equal Future Regression?

Last week, Brian wrote a post examining teams that have blown multiple games within a season in which their win probability was 95 percent or higher.  The "blown game factor" statistic he referenced is essentially a reversal of the comeback factor (CF), a measure of how unlikely a given win was.  You can click on the link for a fuller explanation, but for the purposes of this article, just know this: a CF of 20 indicates a team's lowest win probability during the game was five percent.

So using this "five-percent rule," we can do the opposite exercise, and take a look at the teams with multiple "big comeback" wins within a season, including the playoffs.  Examining the blown-game article, it was interesting that most of the teams were mediocre, with a few especially bad teams and a few very good ones.  But excluding the 2013 Bucs and Texans, the average number of losses was 8.5, the definition of average.

That's not really the case with the comeback teams.  As you might expect, these teams are a little better, as 14 of the 24 comeback squads made the playoffs (I've excluded 2012 and 2013 data from the table, since we obviously don't know how this year will finish yet).  However, unless a team had one of the two signature quarterbacks of this era, they were almost certain to regress the following season:

Playoff Clarity

Sam Waters is the Managing Editor of the Harvard Sports Analysis Collective. He is a senior economics major with a minor in psychology. Sam has spent the past eight months as an analytics intern for an NFL team. He used to be a Jets fan, but everyone has their limits. 

The NFL season might only be nine weeks old, but the league’s playoff picture is already starting to gain clarity. Having a playoff picture that is too clear, too early, might make games a little less exciting, so I thought it would be interesting to see if this year’s playoff qualifiers really are more certain at this point than in past years.

We can start to attack this question using the projections at nfl-forecast.com, which use the team efficiency ratings here at Advanced NFL Stats to estimate each team’s chances of making the playoffs. According to these projections, eight teams currently have a probability of playoff qualification higher than ninety percent. While these teams (like Denver, Kansas City, and Seattle) are virtual locks, we have thirteen teams with a ten percent chance or less of qualifying, leaving us with a pretty polarized distribution of playoff odds after nine weeks of play:

Great Offenses > Great Defenses Visualized

A few weeks back I wrote about how the distribution of team offensive production is measurably wider than team defensive production. Although I've written about the phenomenon a few times over the years, it never hurts to apply newer and better analytic tools to the question.

I had produced this histogram to illustrate the comparison between offense and defense, but the format doesn't mesh very well at NYT. For those not familiar, a histogram plots the frequency of occurrence of various levels of a variable. In this case it's a plot of team total Expected Points Added (EPA) for the 2000-2012 regular seasons. For example, there were 62 defenses (the lighter plot) that totaled between 10 and 35 EPA for a season. And there were 43 offenses that totaled the same amount.

Separating Receiver from Quarterback: A Start

Ty Aderhold and David Freed are second-year members of the Harvard Sports Analysis Collective. Ty is a sophomore majoring in History and Science with a minor in Global Health and Health Policy, and is a big fan of all Atlanta sports teams (proving Atlanta sports fans do actually exist). David is majoring in applied math (focusing on economics) and minoring in statistics. He is currently looking for a vintage Vince Carter Raptors jersey.

One of the biggest stories from Sunday was Calvin Johnson’s monstrous 329-yard receiving day, which prompted teammate Reggie Bush to call him “the greatest of all time” after the game. By contrast, because it came in a win, Tom Brady’s 116-yard performance went under the radar. Johnson’s big day and Brady’s less-than-stellar one prompt questions about the relationship between a quarterback and his top receiver.

One of the central figures in this debate is Matthew Stafford. For almost his entire NFL career, many have considered him a quarterback that relies on Johnson for his success. Stafford’s recent struggles in the Lions’ Week 5 game against the Packers in which Johnson didn’t play only added credence to this theory. At the same time, Tom Brady has been regarded for years as a superstar quarterback that can generate above average stats for otherwise pedestrian receivers. Many considered it to be Brady that made Wes Welker great, not the other way around. However, it has been apparent throughout the season, as it was against the Dolphins this past weekend, that Brady is suffering from a lack of talented receivers (and, potentially, an undisclosed injury). This post takes a step towards separating the value of a quarterback from his top receiver so we can better compare quarterback play across the league. It will also take an in-depth look at Matthew Stafford and Tom Brady with the goal of better understanding these quarterbacks and their successes with the likes of Calvin Johnson and Wes Welker.

To begin separating out the value of a quarterback from that of his top receiver, we looked at the best quarterback from each team in 2012 and his top receiver (defined as the receiver who gained the most yards). We also limited our data only to games that the quarterback and receiver played together. After computing the raw quarterback ratings for each quarterback, we subtracted those plays on which he targeted his best receiver and recalculated his statistics.


When the Defense Should Decline a Penalty After a Loss Part 2 (2nd Downs)

I recently looked at when it made sense for the defense to decline a 10-yard holding penalty following a 1st down play for no gain or a loss. It turned out that defenses should generally prefer to decline after a loss of 3 or more yards.

First downs are easier to analyze because they almost always begin with 10 yards to go. Unfortunately, 2nd downs aren't so cooperative. It's amazing how thin he data gets sliced up. Most downs aren't losses, even fewer have holding penalties, and rarely are they declined. Still, there are enough cases for a solid analysis using 1st-down conversion probability as the bottom line.

Put simply, a defense would prefer to decline a penalty on a 2nd down play whenever the resulting 3rd down situation leads to a conversion less often than the 2nd down plus the 10 yards.

The chart below plots conversion probability for 2nd and 3rd down situations. The red line illustrates the conversion probability of 3rd down and X to go situations. For example, 3rd down and 7 situations are converted about 40% of the time.

The green line illustrates 2nd down situations, but slightly differently. It plots conversion probabilities for 2nd down and X plus 10 yards. For example, 2nd and 13 (i.e. 3 + 10 yds) situations are converted 45% of the the time. The black line is the smoothed line fitted to the 3rd down conversion rates. I plotted things this way because it's the actual comparison we're interested in, given a gain of zero yards.

Jets 'Push' Their Luck

After forcing overtime, the Jets stopped the Patriots on their first drive, reverting to the old OT format - sudden death. Geno Smith and the Jets moved downfield before being stopped for a 4th-and-7 from the New England 38. Rex Ryan had three viable options here, keeping in mind that the next score wins: Kick a low-probability (40% league-wide) 55-yard field goal, attempt to convert a low-probability (42% league-wide) 4th-and-7, or punt the ball deep and risk Tom Brady leading a game-winning drive.

The Jets elected to attempt the field goal. Nick Folk missed wide left, but in a crazy turn of events, New England was penalized 15-yards for an unsportsmanlike conduct "pushing" penalty. Before we get to the penalty, let's talk about the decision. While I almost always advocate going for it in no-man's land, in this situation, I was leaning toward the punt.

For this analysis, I used a combination of my Markov probabilities as well as Brian's overtime win probabilities.

When Should the Defense Decline a Penalty After a Loss? Part 1

Let's say there's a sack or other tackle that results in a several-yard loss. And to compound the offense's woes a flag for holding is thrown, potentially setting up a 1st and 20 situation. Should the defense accept or decline the penalty and force a 2nd and X? We can evaluate this question in a few ways. We'll use a simple method and a more complex method to find out when a defense should normally decline a penalty on first down.

Before you read on, what do you think the break-even yardage is? What do you think most coaches think it is?

Which Teams Should Abandon the Run?

Yeah, yeah, yeah. It's a passing league. We got it. And still, according to the numbers, teams aren't passing enough. In the cases of some teams, it's painfully obvious that they should be passing more and running less. As a Ravens fan, I watched another game where nearly every run was simply a wasted down. Most of their paltry positive rushing yards seem to come from trash draw plays on long distances to gain, intended to mitigate very poor field position prior to a punt. It's like they're playing with two or three downs when everyone else gets four.

I wonder if, at some point, when an offense is so much better at passing than running, should it abandon the run almost altogether. On top of the general imbalance in the league, some teams are just throwing away downs when calling conventional run plays. Of course, running and passing generally play off of each other in a game-theory sense. To be successful, passing needs the threat of running, and vice versa. But sometimes, the cost of running is so high for some offenses, that it would be worth the trade-off to forfeit the unpredictability and just pass nearly every down.

It sounds crazy, but take a look at the Expected Points Added per play so far this season (through the 1pm games on Sunday 10/13). The right-most column is the pass-run split. The bigger that number, the greater the imbalance. Pay particular attention to the teams highlighted in red:

Examining the Value of Coaches' Challenges

Kevin Meers is the Co-President of the Harvard Sports Analysis Collective. He is a senior majoring in economics with a statistics minor, and has spent the past two years or so as an analytics intern in the NFL. He is currently writing his thesis on game theory in the NFL, and probably puts too much thought into how the perfect fantasy football league would be structured.

The coach’s challenge is an important yet poorly understood part of the NFL. We know challenges are an asset, but past that, we do not have a good understanding of what makes a good challenge or if coaches are actually skilled at challenging plays. This post takes a step towards better understanding those questions by examining the value of the possible game states that stem from challenged plays.

To value challenges, we must understand how challenges change the game’s current state. When a play is challenged, the current game state must transition into one of two new game states: one where the challenged play is reversed, the other where it is upheld. These potential game states are the key to valuing challenges.

Let’s look at a concrete example from last season. With two minutes and two seconds left in the fourth quarter in their week ten matchup, Atlanta had first and goal on New Orleans’ ten-yard line. Matt Ryan completed a pass to Harry Douglas, who was ruled down at the Saints’ one-yard line… only Douglas appeared to fumble as he went to the ground, with the Saints recovering the ball for a potential touchback. When New Orleans challenged the ruling on the field, the game could have transitioned into two possible game states: Atlanta’s ball with second and goal on the one, or New Orleans’ ball with first and ten on their own 20 yard line. If the Saints lost the challenge, they would have a Win Probability (WP) of 0.28, but if they won, their WP would jump to 0.88. This potential WP added, which I refer to as “leverage,” is key to valuing challenges. Mathematically, I define leverage as: