Coder Social home page Coder Social logo

nflscrapr's Introduction

Introducing the nflscrapR Package

This package was built to allow R users to utilize and analyze data from the National Football League (NFL) API. The functions in this package allow users to perform analysis at the play and game levels on single games and entire seasons. By parsing the play-by-play data recorded by the NFL, this package allows NFL data enthusiasts to examine each facet of the game at a more insightful level. The creation of this package puts granular data into the hands of any R user with an interest in performing analysis and digging up insights about the game of American football. With open-source data, the development of reproducible advanced NFL metrics can occur at a more rapid pace and lead to growing the football analytics community.

Note: Data is only available after 2009… for now

Installation

# Must install the devtools package using the below commented out code
# install.packages("devtools")

# Then can install using the devtools package from either of the following:
devtools::install_github(repo = "maksimhorowitz/nflscrapR")
# or the following (these are the exact same packages):
devtools::install_github(repo = "ryurko/nflscrapR")

Gather game ids

Using the scrape_game_ids function, one can easily access all pre-, post-, and regular season games for a specified season as well as options for the week and teams. The code below returns a dataframe containing the games for week 2 of the 2018 NFL season:

# First load the package:
library(nflscrapR)
#> Loading required package: nnet
#> Loading required package: magrittr

week_2_games <- scrape_game_ids(2018, weeks = 2)
#> Loading required package: XML
#> Loading required package: RCurl
#> Loading required package: bitops
# Display using the pander package:
# install.packages("pander")
week_2_games %>%
  pander::pander()
type game_id home_team away_team week season state_of_game
reg 2018091300 CIN BAL 2 2018 POST
reg 2018091600 ATL CAR 2 2018 POST
reg 2018091608 WAS IND 2 2018 POST
reg 2018091607 TEN HOU 2 2018 POST
reg 2018091606 TB PHI 2 2018 POST
reg 2018091605 PIT KC 2 2018 POST
reg 2018091604 NYJ MIA 2 2018 POST
reg 2018091601 BUF LAC 2 2018 POST
reg 2018091602 GB MIN 2 2018 POST
reg 2018091603 NO CLE 2 2018 POST
reg 2018091610 SF DET 2 2018 POST
reg 2018091609 LA ARI 2 2018 POST
reg 2018091612 JAX NE 2 2018 POST
reg 2018091611 DEN OAK 2 2018 POST
reg 2018091613 DAL NYG 2 2018 POST
reg 2018091700 CHI SEA 2 2018 POST

Example play-by-play analysis

Here is an example of scraping the week 2 matchup of the 2018 NFL season between the Kansas City Chiefs and the Pittsburgh Steelers. First, access the tidyverse library to select the game id and then use the scrape_json_play_by_play function to return the play-by-play data for the game:

# install.packages("tidyverse")
library(tidyverse)
#> ── Attaching packages ──────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.6
#> ✔ tidyr   0.8.1     ✔ stringr 1.3.1
#> ✔ readr   1.1.1     ✔ forcats 0.3.0
#> Warning: package 'dplyr' was built under R version 3.5.1
#> ── Conflicts ─────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ tidyr::complete()  masks RCurl::complete()
#> ✖ tidyr::extract()   masks magrittr::extract()
#> ✖ dplyr::filter()    masks stats::filter()
#> ✖ dplyr::lag()       masks stats::lag()
#> ✖ purrr::set_names() masks magrittr::set_names()

# Now generate the play-by-play dataset for the game:
kc_vs_pit_pbp <- week_2_games %>%
  filter(home_team == "PIT") %>%
  pull(game_id) %>%
  scrape_json_play_by_play()

Now using the estimates from the nflscrapR expected points and win probability models we can generate visuals summarizing the game. For example the win probability chart below shows how the Chiefs early lead faded in the second quarter, before they took sealed the game in the second half:

# Install the awesome teamcolors package by Ben Baumer and Gregory Matthews:
# install.packages("teamcolors")
library(teamcolors)

# Pull out the Steelers and Chief colors:
nfl_teamcolors <- teamcolors %>% filter(league == "nfl")
pit_color <- nfl_teamcolors %>%
  filter(name == "Pittsburgh Steelers") %>%
  pull(primary)
kc_color <- nfl_teamcolors %>%
  filter(name == "Kansas City Chiefs") %>%
  pull(primary)

# Now generate the win probability chart:
kc_vs_pit_pbp %>%
  filter(!is.na(home_wp),
         !is.na(away_wp)) %>%
  dplyr::select(game_seconds_remaining,
                home_wp,
                away_wp) %>%
  gather(team, wpa, -game_seconds_remaining) %>%
  ggplot(aes(x = game_seconds_remaining, y = wpa, color = team)) +
  geom_line(size = 2) +
  geom_hline(yintercept = 0.5, color = "gray", linetype = "dashed") +
  scale_color_manual(labels = c("KC", "PIT"),
                     values = c(kc_color, pit_color),
                     guide = FALSE) +
  scale_x_reverse(breaks = seq(0, 3600, 300)) + 
  annotate("text", x = 3000, y = .75, label = "KC", color = kc_color, size = 8) + 
  annotate("text", x = 3000, y = .25, label = "PIT", color = pit_color, size = 8) +
  geom_vline(xintercept = 900, linetype = "dashed", black) + 
  geom_vline(xintercept = 1800, linetype = "dashed", black) + 
  geom_vline(xintercept = 2700, linetype = "dashed", black) + 
  geom_vline(xintercept = 0, linetype = "dashed", black) + 
  labs(
    x = "Time Remaining (seconds)",
    y = "Win Probability",
    title = "Week 2 Win Probability Chart",
    subtitle = "Kansas City Chiefs vs. Pittsburgh Steelers",
    caption = "Data from nflscrapR"
  ) + theme_bw()

Example of gathering season data

You can also use the scrape_season_play_by_play function to scrape all the play-by-play data meeting your desired criteria for particular season. Note that this function can take a long time to run due to pulling potentially an entire season’s worth of data. The code below demonstrates how to access all play-by-play data from the 2018 pre-season:

preseason_pbp_2018 <- scrape_season_play_by_play(2018, "pre")

nflscrapr's People

Contributors

maksimhorowitz avatar michaelm91 avatar ryurko avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.