Coder Social home page Coder Social logo

jaseziv / worldfootballr Goto Github PK

View Code? Open in Web Editor NEW
414.0 13.0 53.0 15.17 MB

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat and fotmob

Home Page: https://jaseziv.github.io/worldfootballR/

R 100.00%
r sports-data soccer-data football-data fbref transfermarkt understat fotmob rstats football

worldfootballr's Introduction

worldfootballR

Version-Number R build status codecov

CRAN status CRAN downloads Downloads

Overview

This package is designed to allow users to extract various world football results and player statistics from the following popular football (soccer) data sites:

Installation

You can install the CRAN version of worldfootballR with:

install.packages("worldfootballR")

You can install the released version of worldfootballR from GitHub with:

# install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")
library(worldfootballR)

Usage

Package vignettes have been built to help you get started with the package.

  • For functions to extract data from FBref, see here
  • For functions to extract data from Transfermarkt, see here
  • For functions to extract data from Understat, see here
  • For functions to extract data for international matches from FBref, see here
  • For functions to load pre-scraped data, see here

Loading Data

Since the release of v0.5.3, the library now supports very rapid loading of pre-collected data through the use of load_ functions.

The data available for loading is stored in the worldfootballR_data repository. The repo can be found here.

Head to the vignette here to see examples of which data is available for rapid loading.


News

To stay up-to-date with the latest changes, see the package change log

Note that fotmob data is no longer provided since the release of v0.6.4 due to a change in their terms of service.


Leagues and Seasons

FBref

For FBref.com data (match and season data), a list of leagues and seasons included in the package can be found in the worldfootballR_data repository and can be found here

Transfermarkt

For transfermarkt.com data (valuations and transfers), a list of leagues and seasons included in the package can be found in the worldfootballR_data repository and can be found here

Understat

The following leagues are currently supported by Understat (these values can be passed in to the league arguments of most understat_ functions):

  • “EPL”
  • “La liga”
  • “Bundesliga”
  • “Serie A”
  • “Ligue 1”
  • “RFPL”

Attribute the Source

When using the functions in the package, please ensure you attribute the source of the data based on the function you use.

Data providers are listed below:

Acknowledgements

Special mention goes out to Ewan Henderson’s awesome understatr library for the inspiration and internal code for the understat_ functions contained in this package.


Contributing

Issues and Improvements

When creating an issue, please include:

  • Reproducible examples
  • A brief description of what the expected results are
  • If applicable, the fbref.com, transfermarkt.com or understat.com page the observed behaviour is occurring on
  • For improvement suggestions, what features are being requested and their purpose

Feel free to get in touch via email or twitter https://twitter.com/jaseziv if you aren’t able to create an issue.

Show your support

Follow me on Twitter (jaseziv) for updates

If this package helps you, all I ask is that you star this repo. If you did want to show your support and contribute to server time and data storage costs, feel free to send a small donation through the link below.

Coffee (Server Time)

worldfootballr's People

Contributors

drasbaek avatar francescozonaro avatar jaseziv avatar nepito avatar rvdmaazen avatar shufinskiy avatar szfh avatar tanho63 avatar tonyelhabr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

worldfootballr's Issues

get_season_team_stats() not working for PL, LaLiga and Bundesliga 2020/21 anymore

get_season_team_stats(c("ENG", "ESP", "ITA", "GER", "FRA"), "M", c(2018:2021), "1st", "standard") used to work for all categories. Now it says
Scraping season standard stats NOTE: Stat Type 'standard' is not found for this league season. Check https://fbref.com/en/comps/9/Premier-League-Stats to see if it exists. NOTE: Stat Type 'standard' is not found for this league season. Check https://fbref.com/en/comps/12/La-Liga-Stats to see if it exists. NOTE: Stat Type 'standard' is not found for this league season. Check https://fbref.com/en/comps/20/Bundesliga-Stats to see if it exists.
The result includes all four seasons for Ligue 1 and Serie A, but only 2018-2020 for PL, LaLiga and BL.

fb_player_season_stats not returning results for some players

Appears that this error is typically for players who have only participated in one league or cup type:

This works:
fb_player_season_stats("https://fbref.com/en/players/3bb7b8b4/Ederson", stat_type = "standard")

While this doesn't:
fb_player_season_stats("https://fbref.com/en/players/1d0a0f3e/Stefan-Frei", stat_type = "standard")

get_player_market_values failing for seasons before 2020/21

Since the last update on get_player_market_values it is failing for seasons before 2020/21.
Error message: Column "joined_from" not found in ".data"
get_player_market_values(country_name = "England", start_year = 2020) works,
get_player_market_values(country_name = "England", start_year = 2019) doesn't. Also tested for some other countries.
Tested with R version 4.0.5.

`get_match_results()` before 2014-15 season

The following returns the Data not available for the season(s) selected.

res <- worldfootballR::get_match_results(country = "ENG", gender = "M", season_end_year == 2013)

I believe the same error also occurs for the other Big 5 leagues for Men's 1st Tier and for seasons before 2014-15 season. I believe the issues is due to empty "Time" columns on the fixtures page

time

I think this can be resolved fairly easily with the following change here

dplyr::filter(is.na(.data$Time) | .data$Time != "Time")

I'll leave it up to you to decide if you think this is the right change. If so, let me know if you'd like me to submit a pull request.

duplicate urls

league_urls <- fb_league_urls("ENG","M",2020) %>% print
  
match_urls <- get_match_urls("ENG","M",2020) %>% print

These functions get each expected URL twice instead of once, maybe because of duplicated lines data in the raw-data folder.

get_season_team_stats advanced stats error

get_season_team_stats("ENG","M","2020","standard")

Error in gsub("\+", "plus", names(stat_df)) :
object 'stat_df' not found

Changes in fbref back end cause get_season_team_stats to fail for all stat types except league_table and league_table_home_away.

get_match_summary --- Arguments imply differing number of rows

I've been able to retrieve EPL match summaries up to 2013/2014 perfectly, however when I was trying to retrieve 2012/2013, I ran into this:

`> match_urls <- get_match_urls(country = "ENG", gender = "M", season_end_year = 2013)
[1] "Scraping match URLs"
[1] "Match URLs scrape completed"

match_summaries <- get_match_summary(match_url = match_urls)
[==============================================================>-------------------------------------------------] 56%Error in data.frame(Team = Home_Team, Home_Away = "Home", events_string = events_home) :
arguments imply differing number of rows: 1, 0`

Similarly for 2011/2012:

`> match_urls <- get_match_urls(country = "ENG", gender = "M", season_end_year = 2012)
[1] "Scraping match URLs"
[1] "Match URLs scrape completed"

match_summaries <- get_match_summary(match_url = match_urls)
[==========================================================================================>---------------------] 81%Error in data.frame(Team = Home_Team, Home_Away = "Home", events_string = events_home) :
arguments imply differing number of rows: 1, 0

I took a brief look at the EPL fixtures for 2011/2012, going to where the 81% mark would roughly be and didn't find any difference in how FBref is missing or match summary data. Any explanation would be appreciated as to why this is happening.

event_half column for get_match_summary()

Events in first half stoppage time can cause unusual results, for example:

url <- "https://fbref.com/en/matches/9f0275a4/North-West-Derby-Manchester-United-Liverpool-May-13-2021-Premier-League"

match_goals <-
  url %>%
  get_match_summary() %>%
  filter(event_type=="Goal") %>%
  select(event_time:score_progression)

Has a goal scored in the 48th minute of the first half and the 47th minute of the second half, which can cause errors.

image

A simple solution is something like below (but probably won't deal with extra-time matches well).

mutate(event_half=ifelse(as.numeric(str_sub(time,1,2)) <= 45,1,2))

get_match_urls() gets stuck on COVID postponed games

For example:

get_match_urls(country = "FRA", gender = "M", season_end_year = 2020, tier="1st") %>% get_match_lineups()

gets to 74% and then returns

Error in [.data.frame(lineup, , 1) : undefined columns selected

I've tried troubleshooting by looping through each url and it seems like it's for games that got postponed due to COVID - they have an assigned URL but no lineups listed. I stuck a tryCatch in the manual loop to get by these but that's probably a bad blanket solution.

Ability to load and choose position based scouting report, for players with multiple positions

Hi , firstly let me say this is a very useful and fun package to use, so thank you @JaseZiv

An issue I've found is currently attempting to scrape data on players with multiple positions will result in the following error :
image
It would seem the package isn't sure which position report should be used, although I'm just guessing here.
This unfortunately means data scraping for players with multiple positions on fbref ,currently appears not possible.
Additionally, such a functionality that allows for choosing which position based scouting report to scrape would facilitate some interesting data analysis possibilities.

Anyway, I hope you keep up this project, I'm sure it helps a good number of statistical minded footy fans like me.

add remaining contract duration to transfer history

Hi Jason, first of all Thank you very much for your package. It's really useful and I have a lot of fun using it and experimenting with all the data. I am currently trying to use machine learning to predict transfer fees. Therefore it would be very useful to include the remaining contract duration of players that are transferred to another club. transfermarkt.com provides this information for most transfers, but you have to look into transfer details for each single transfer. Do you think it would be possibly to extract this information?
As an example, why it would be interesting to look at contract durations:
https://www.transfermarkt.com/leroy-sane/transfers/spieler/192565/transfer_id/1556750
https://www.transfermarkt.com/leroy-sane/transfers/spieler/192565/transfer_id/2954050

Actual transfer date (or at least window) in transfermarkt data

Would be useful for some analysis I'm doing. It's pretty annoying though because it doesn't look like transfermarkt actually displays that info on any of the transfer pages, just on the individual player pages. If anyone has suggestions for how to do this, i'm happy to work on the implementation myself.

player_transfer_history failing for players who have been or will be without club

player_transfer_history is failing for players who have been without club or will be in the future because 'country_to' can't be found in these cases.
player_transfer_history("https://www.transfermarkt.com/sergio-aguero/profil/spieler/26399") player_transfer_history("https://www.transfermarkt.com/christian-gentner/profil/spieler/19112")

Encoding issues on Windows for player_dictionary_mapping()

Player names get returned like "Ömer Toprak". Specifying UTF-8 encoding explicitly within read.csv() in the function didn't work for me. Changing to readr::read_csv() with no additional arguments seems to solve the problem, like this:

function () { players_mapped <- readr::read_csv("https://github.com/JaseZiv/worldfootballR_data/raw/master/raw-data/fbref-tm-player-mapping/output/fbref_to_tm_mapping.csv") return(players_mapped) }

tm_league_team_urls() giving an error

When I run

team_urls <- tm_league_team_urls(country_name = "England", start_year = 2020)

I get the following error:

Error in nrow(meta_df_seasons) : object 'meta_df_seasons' not found

get_match_shooting error when only one table available

It's possible to only have one team shots in a match, which causes an error with get_match_shooting.

library(worldfootballR)
# library(tidyverse)

urltest <- "https://fbref.com/en/matches/bf52349b/Fulham-Arsenal-September-12-2020-Premier-League"
url1 <- "https://fbref.com/en/matches/f35f4268/Huddersfield-Town-Swansea-City-March-10-2018-Premier-League"
url2 <- "https://fbref.com/en/matches/5e35e444/Bournemouth-Manchester-City-March-2-2019-Premier-League"

# this one works
shotstest <- get_match_shooting(urltest)

# these ones don't
shots1 <- get_match_shooting(url1)
shots2 <- get_match_shooting(url2)

Error obtaining lineups pre 2015

I've been getting "Error in [.data.frame(lineup, , 1) : undefined columns selected" when I've tried to get PL lineups from around the 2015 season and before. As far as I can tell the data does still exist in fbref for these seasons.

match_urls <- get_match_urls(country = "ENG", gender = "M", season_end_year = 2003)
[1] "Scraping match URLs"
[1] "Match URLs scrape completed"
> match_lineups <- get_match_lineups(match_url = match_urls)
[1] "Scraping lineups"
Error in `[.data.frame`(lineup, , 1) : undefined columns selected
>

Build dictionary mappings

Create a function (or set of functions) to be able to map league, team, player names from fbref.com and transfermarkt.com data

`tm_player_bio()` throws error

Hi Jason,

great package! Would have saved me a lot of time and work if I had learned about it earlier! :)

I just went through the vignette and tried a few of the examples. It seems to me that there is a (dplyr related) bug in tm_player_bio():

Running

hazard_bio <- tm_player_bio(player_url = "https://www.transfermarkt.com/eden-hazard/profil/spieler/50202")

gives me

Error: Problem with `filter()` input `..1`.
Input `..1` is `!stringr::str_detect(.data$X1, "Social-Media")`.
Column `X1` not found in `.data`
Run `rlang::last_error()` to see where the error occurred.

Any ideas?

Best,
Martin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.