Search the Community
Showing results for tags 'longform'.
Found 1 result
There has been quite a lot of conversation on the forums lately about stats, there use, and are they useful for assessing the level of play on the field. To be clear from the start, stats are not the sole answer to analysing football, and certainly need triangulating with other data sources (e.g. film). In fact what they are best used for is guiding analysis of other sources; you could say they give you the right questions to ask, rather than the answers. Part of the issue with stats is, well, there are so many. It’s hard to judge which are the important ones vs ones that are, at best, vanity. But what if there was a way to relatively compare metrics? It’s often cited that the game is all about winning, so what if we took individual metrics and compare them to a team’s winning percentage to measure the correlation. That is to say, how much a metric potentially affects a team’s chances of winning. Now it’s important to note that correlation does not always equal causation, as explained brilliantly here https://tinyurl.com/y5x3hh9m. We can however use logic, and our more holistic understanding of the game to sanity check the correlations. Now if we’re going to compare metrics based on correlation we need some quantitative way of measuring this. That’s where the Coefficient of determination comes in. More commonly known as R Squared, it is the proportion of the variance in the dependent variable (in our case Winning %) that is predictable from the independent variables (the metrics we’re assessing). Or more simply put how well two sets of data “fit” suggesting a relationship between the two. For my purposes this will be a value between 0 and 1, where 1 would be full correlation (unlikely) and 0 is no correlation. Now as football is a complicated game with many moving parts I wouldn’t expect to see many metrics with strong correlation, but we can use it as a measure to compare them relatively. Before we dive in, some things to make clear: 1) This is a long post, and will be very stats heavy. If that’s not your thing, please don’t make a judgement without reading it fully. 2) This is not a way to predict future outcomes. This is examining historical data to identify trends to inform further discussion. If you wanted to predict games, you’d take this analysis and build an analytical model, weighing metrics accordingly. 3) While it can’t predict, it can be used to give context to stats/metrics and an idea if a team is performing well or not, and which are the more important areas to be doing well in. If the correlation is good we can use it to give values for “tiering” the metrics. 4) This isn’t intended as gospel these are all the answers, it’s meant as a prompt for discussion. 5) All data used is from Pro Football Reference (https://www.pro-football-reference.com/), it’s a great site, use it. The methodology used for all the below was to take 10 seasons worth of data and look at each team’s final standings, giving us 352 data points for each metric, with each point in turn made up of 16 games worth of data. Basically the sample is 2,816 NFL regular season games. Now without further ado let’s, as Missy Elliot said, get our geek on and look at some metrics: Points When we’re talking about winning it’s hard to get away from the obvious. You win by scoring more points than the opposition. So if you take the average points per game for each team: You can see there is fairly decent correlation which would imply the more points you score on average per game, the more likely you are to win said game. Now, as the correlation is fairly good, we take the line of best fit and use it’s equation to give us some rough values for tiering the metric. If we plumb in win percentages of 25%, 50% and 75% we get: 25% - 14.73 PPG 50% - 22.42 PPG 75% - 30.11 PPG When you look at the spread between the tiers, it’s perhaps not surprising that it’s 7.69 points, or just over a touchdown. But what about the defensive side of the game? There is after all a team trying to score on you too: As you’d expect there is some correlation in that the more points you allow on the whole, the less likely you are to win the game. Interestingly though, the correlation is slightly less that points scored. If we apply the same tiering: 25% - 29.10 PPG Allowed 50% - 22.46 PPG Allowed 75% - 15.83 PPG Allowed However, what wins you a game is scoring more than your opponent on any given Sunday, so let’s look at average point differential over the course of a season: A very strong correlation, as perhaps to be expected, and I’d go out on a limb as to say it’s the most important metric. Seems like it’s stating the obvious, score more points than your opposition and you’ll win. But it’s also about by how many you beat the opposition. Dominant teams tend to win more games. Again, not exactly surprising, and looking at the tiering: 25% - (-) 8.94 Point Differential 50% - (-) 0.01 Point Differential 75% - 8.92 Point Differential Looking at the above it would suggest bad teams are losing by more than TD on average, and good teams are winning by more than one on average. Which means games decided by less than one TD, or by even less, are going to tend towards that 50% mark as the scores approach parity. This supports the idea that NFL games where the winning margin is a TD or less are “coin flip” games. Yards In order to get points we need to be able to move the ball on the field. I’ve often said I’m not enamoured with using season volume stats or even per game stats (unless we’re talking points) as they can be skewed by the number of plays a team runs. Instead if we look per play, we know we’re getting a true apples/apples comparison between teams. Starting then by looking at average yards gained per offensive play: A surprisingly low correlation, but then perhaps there is some logic here. It’s no good getting yards if you’re not getting points, and there will be plenty of “junk” yards due to game situations. Because the correlation is low I won’t tier the metric as it wouldn’t really be sensical. Looking on the defensive side of the ball: Even more suprising. Lends weight to the argument that you can “bend not break” on D as long as you’re not giving up points. Conversions Due to the nature of the game, it’s not just about yards; it’s about the context of those yards too. That is to say, if you’re not converting your downs then you’re offense isn’t going to be staying on the field. Let’s start with just the % of plays on any down that resulted in a 1st Down: A really surprsing lack of correlation here. But ok, a lot of downs the O might not be drawing up chunk plays so there will be a fair few plays that unless they break will go for less than 10 yards. The key down is often held to be 3rd down, so let’s look at 3rd down conversion percentage: This is even more of a surprise. The correlation is so small we’re having to express it as a power of e. I’m not relaly sure what to make of this, other than it would suggest being able to conistently convert on 3rd down might not be as important as you’d think. I’d caveat though there will be a huge amount of game situation nuance that might not come through in the numbers. For sake of completeness what about teams going for it on 4th down? No major correlation here either. Time of Possession It’s been often held up that it’s important to win the Time of Possession battle, and while we know it’s certainly important to milk the clock or hurry up as the game situation dictates, does it overall make you more likely to win? As we can see there is some coerralation but not as strong as you might expect. Turnovers Another area that is put forward as being a key to winning are turnovers, so let’s look at how they affect the Win %: The above graph shows Offensive Turnover percentage, that is the percentage of offensive plays that ended in a turnover against winning percentage as expected there is a fairly strong negative correlation. But what about the defensive takeaways? While there is some correlation between the percentage of defensive plays that end in a turnover and winning % it’s not quite strong as the offensive turnovers. However when we talk about turnovers we often talk about the battle, so similarly to the points discussion above, what about the turnover differential? Unsurprisingly a stronger correlation, if you give up the ball less than your opponent on the day it’s probably going to help you win. Penalties An area that often frustrates me, as we see such inconsistent officiating week to week, but does getting penalised more affect your chances of winning? Looking first by average no. penalties per game: I was expecting there be a stronger negative correlation here, but given the variance in yards depending on the type of penalty maybe the distance given up has more effect: An even smaller correlation, which could suggest that there isn’t a huge benefit to being a “disciplined” team. However I would also suspect that this another thing that might not show up in the stats as such but will have an impact situationally. Passing We previously looked at combined offensive yards but what about if we start splitting between the passing and running games? Looking first at passing yards per attempt: We can see there’s a stonger correlation than combined offensive yards per play, and it’s strong enough in my opinion to give some loose tiering: 25% - 5.03 Yards per Attempt 50% - 6.68 Yards per Attempt 75% - 8.32 Yards per Attempt What about some other metrics that we look at when assessing QBs? I was somewhat surprised that the correlation wasn’t stronger for completion percentage, given the weight given to accurarcy when QBs are assessed, especially during the draft process. One thing to consider is, how much is completion percentage a measure of QB’s accuracy. Referencing back to points (TDs) and turnovers (INTs) what effect do they have? We can see there is decent correlation between the percentage of offensive plays that end in a passing touchdown and winning. There is less negative correlation to the percentage of plays that end in an interception. This could suggest it’s better to trade having a reasonably worse TD:INT ration for a higher volume of passing touchdowns. Another area that is considered to be a drive killer is sacks. So does it follow through the more you’re sacked the less likely you are to win? Perhaps not as much as you might have thought. What about the amount of yards given up on average to sacks? Again less negative correlation then perhaps to be expected. As there is fairly decent correlation for the passing metrics, what about if we had some way of combining them into one catch all metric? I’ve never been a huge fan of passer rating, as I felt it was slightly outdated, but the correlation would suggest its still viable as a way of grading QB play. Looking at the usual Tiering: 25% - 63.50 50% - 86.87 75% - 110.23 Another metric I’m a fan of is adjusted net yards per attempt (ANY/A), which is calculated as: (PASSING YARDS – SACK YARDAGE + (20 × TOUCHDOWNS) – (45 × INTERCEPTIONS)) / (PASS ATTEMPTS + SACKS) I like it as it rewards efficient QBs and gives less weighting to sheer volume of yards. A decent correlation, suggesting it could be used as high level broad metric for comparing QB play. Breaking down the tiers: 25% - 4.50 ANY/A 50% - 6.61 ANY/A 75% - 8.70 ANY/A Overall then it would seem having a decent passing attack is conducive to winning more games, but what about defending the pass? Looking just at our two combined metrics: Again there is less correlation on the defensive side of the ball, which could suggest a good passing offense is more of a factor to your chance of winning compared to a good passing defense. But as we’ve established it’s points that is important, let’s look at passing touchdowns allowed: Again less correlation when compared to the offensive side of the ball. Rushing The old mantra that often gets brougght up is “run the ball and stop the run”, but how true is this in the modern league of explose offenses and gaudy passing numbers? Starting with rushing yards per attempt: Pretty shocking at how low the correlation is here. It would suggest that running yardage really doesn’t have much effect on your chances of winning. Again though I’d suspect the situational value of being able to run, and the other effects such as making the defense account for it aren’t show here. As we’ve already postulated though, it’s points that matter, so what about the percentage of offensive plays that end in a Rushing TD? More correlation here, I’d love to further split this down by distance to goal, to see the value of being able to punch it in during goaline situations. On the defensive side of the ball: Slightly more correlation to being able defend rushing yardage, but less for being able to stop rushing TDs. So what does this all mean? As I said, this wasn’t intended as the answer, but an aid to frame questions and discussion. But some things that may be implied from this: 1) Yards, especially when considered as a total volume are somwhat vanity. It’s points that win you games. 2) By how many points you win a game on average is important, the closer to under a TD you get the more likely variance in winning “coin flip” games. 3) The above could be seen as an indicator it gives you marginally more chance of winning to have a good offense vs a good defense. 4) Having a good passing game gives you a better chance of winning than having a good rushing game. 5) It’s probably better to have a aggressive explosive passing offense (High TDs, not getting tied up about worrying about sustainecd drives) vs conservative (Low Ints, sustained drives, but may only be getting FGs). Things I want to look at further: 1) This is all based on regular season games, I want to see if it changes for playoff games. 2) Look at some other metrics (QB hurries for example) and maybe stuff like DVOA.