Shameless Plug Section

If you like Fantasy Football and have an interest in learning how to code, check out our Ultimate Guide on Learning Python with Fantasy Football Online Course. Here is a link to purchase for 15% off. The course includes 15 chapters of material, 14 hours of video, hundreds of data sets, lifetime updates, and a Slack channel invite to join the Fantasy Football with Python community.

nflfastR's EPA model

In this post, we're going to do something that's more general NFL-analytics than straight Fantasy Football analysis.

We're going to be using nflfastR's (exposed through the Python package nflfastpy) EPA (Estimated Points Added) model to visualize the best offenses and defenses in the league.

nflfastpy's play by play data comes with EPA data for each play. EPA is a model that estimates the expected points added per individual play based on starting and ending field position, down, and field goal distance.

Each play has an EPA, and we're going to be finding each team's EPA per play on offense and defense. For offense, it's straight forward. If a play has an EPA of 1.2 on offense, that means the offense moved the ball such that they added an expected 1.2 points to their score. For defense, it's going to be the opposite. If a team is on defense, and the EPA for the play is 1.2, then we'll say the defense gave up or allowed an estimated 1.2 points on the play. Team defenses with more negative EPAs are better defenses, while team defenses with more positive EPAs are worse defenses.

This analysis will be helpful for fantasy purposes since having players on good offensive teams facing poor defensive teams is a recipe for success. This can also be useful for streaming defenses. In the next post we will take this even further to look at strength of schedule. This will be even more helpful for your fantasy team since we can focus in on players that will have an easier defensive schedule in the second half of the season.

First things first, load up your Google colab or jupyter notebook and import the libraries we'll need for this post.

Next, we'll load in 2021 play by play data via nflfastpy. We've used this data quite a bit, just as a reminder it is an extensive database detailing every snap that has taken place so far this year.

play_id game_id old_game_id home_team away_team season_type week posteam posteam_type defteam ... out_of_bounds home_opening_kickoff qb_epa xyac_epa xyac_mean_yardage xyac_median_yardage xyac_success xyac_fd xpass pass_oe
0 1 2021_01_ARI_TEN 2021091207 TEN ARI REG 1 NaN NaN NaN ... 0 1 NaN NaN NaN NaN NaN NaN NaN NaN
1 40 2021_01_ARI_TEN 2021091207 TEN ARI REG 1 TEN home ARI ... 0 1 0.000000 NaN NaN NaN NaN NaN NaN NaN
2 55 2021_01_ARI_TEN 2021091207 TEN ARI REG 1 TEN home ARI ... 0 1 -1.399805 NaN NaN NaN NaN NaN 0.491433 -49.143299
3 76 2021_01_ARI_TEN 2021091207 TEN ARI REG 1 TEN home ARI ... 0 1 0.032412 1.165133 5.803177 4.0 0.896654 0.125098 0.697346 30.265415
4 100 2021_01_ARI_TEN 2021091207 TEN ARI REG 1 TEN home ARI ... 0 1 -1.532898 0.256036 4.147637 2.0 0.965009 0.965009 0.978253 2.174652

5 rows × 372 columns

Here, we're making a DataFrame called epa_df which will sum up team EPAs for each play and we'll also count the number of plays. In a moment, we'll also visualize the relationship between team offensive yardage and team EPA / play.

offense_epa offense_plays offense_yards offense_epa/play
ARI 71.112288 752 3586.0 0.094564
TB 56.053889 698 3385.0 0.080306
LA 57.654229 728 3596.0 0.079195
IND 57.924929 747 3341.0 0.077543
KC 56.133836 780 3540.0 0.071966
GB 41.429580 701 3001.0 0.059101
DAL 37.671326 715 3478.0 0.052687
BUF 33.788242 694 3123.0 0.048686
TEN 35.558411 786 3215.0 0.045240
CLE 30.852390 720 3427.0 0.042851

Not many surprised on this list. Arizona and Tampa Bay have clearly been the best offenses this year so it checks out seeing them with the highest offensive epa/play.

Let's move on to visualizing the relationship between yardage and EPA per play. We'll also use the scipy.stats package to find the R-squared and place it in the plot title.

We can see there is decent correlation between yardage and offensive EPA per play. The correlation is actually stronger when you look at team touchdowns. We'll move on, though, to finding defense EPA/play. Since the DataFrame is already instantiated, let's just add the defense columns via assignment.

offense_epa offense_plays offense_yards offense_epa/play defense_epa defense_plays defense_epa/play defense_yards_given_up
DET -59.581578 671 2577.0 -0.088795 73.452331 622 0.118091 3033.0
NYJ -50.516508 659 2637.0 -0.076656 75.569352 697 0.108421 3267.0
WAS -22.375791 652 2789.0 -0.034319 58.822950 680 0.086504 3115.0
KC 56.133836 780 3540.0 0.071966 61.340686 719 0.085314 3437.0
JAX -52.687387 668 2663.0 -0.078873 46.280072 657 0.070442 3003.0

These are the 5 worst defenses in the league by EPA per play. Remember, more positive EPAs per play on the defense side are bad. This means the defense is allowing (an estimated amount) of more points per play.

Let's now visualize the relationship between defensive yards given up by a team and defensive EPA.

We can see here that the correlation between defensive EPA and defensive yardage given up is a looser tighter than offensive EPA and offensive yardage. This may change slightly from year to year.

Let's tie everything together and scatter plot defensive EPA on the y-axis and offensive EPA on the x-axis. This will more clearly demonstrate which teams have a good defense and good offense, bad defense and good offense, bad defense and bad offense, and good defense and bad offense.

And that's it! The visualization is pretty self explanatory, and some of the results make a lot of sense if you've been following the NFL this season. Arizona Cardinals, Buffalo Bills, LA Rams, Bucs - all of the best teams in the league in the upper left corner, and the Texans, Jags, Jets, Dolphins, Bears - all of the worst teams in the league in the top right corner.

Like I mentioned next week we will dive a little deeper and start applying this to strength of schedule for fantasy purposes.

Thanks for reading!