Coder Social home page Coder Social logo

sportsdataverse / sportsdataverse-py Goto Github PK

View Code? Open in Web Editor NEW
69.0 7.0 7.0 15.31 MB

sportsdataverse python package

Home Page: https://py.sportsdataverse.org

License: MIT License

Makefile 0.08% Python 99.69% Batchfile 0.09% Shell 0.14%
nhl nhl-api nba nba-stats nfl nflfastr college-football cfb-data wnba womens-basketball

sportsdataverse-py's Introduction

sportsdataverse-py

Lifecycle:experimental PyPIPyPI - Down
loads Contributors Twitter Follow

See CHANGELOG.md for details.

The goal of sportsdataverse-py is to provide the community with a python package for working with sports data as a companion to the cfbfastR, hoopR, and wehoop R packages. Beyond data aggregation and tidying ease, one of the multitude of services that sportsdataverse-py provides is for benchmarking open-source expected points and win probability metrics for American Football.

Installation

sportsdataverse-py can be installed via pip:

pip install sportsdataverse

# with full dependencies
pip install sportsdataverse[all]

or from the repo (which may at times be more up to date):

git clone https://github.com/sportsdataverse/sportsdataverse-py
cd sportsdataverse-py
pip install -e .[all]

Our Authors

Citations

To cite the sportsdataverse-py Python package in publications, use:

BibTex Citation

@misc{gilani_sdvpy_2021,
  author = {Gilani, Saiem},
  title = {sportsdataverse-py: The SportsDataverse's Python Package for Sports Data.},
  url = {https://py.sportsdataverse.org},
  season = {2021}
}

sportsdataverse-py's People

Contributors

akeaswaran avatar armstjc avatar kazink36 avatar saiemgilani avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sportsdataverse-py's Issues

Assists recorded in boxscores missing in play-by-play

For the December 22, 2021, game between Wyoming and Stanford (game Id: 401372551), the sportsdataverse.mbb.load_mbb_player_boxscore(seasons=[2022]) data for this game records Hunter Maldonado (player Id: 4280267) of Wyoming as getting 8 assists. However, when going through the rows for this game from sportsdataverse.mbb.load_mbb_pbp(seasons=[2022]), Maldonado would only be credited with 6 assists, based upon the number of occurrences of "Assisted by Hunter Maldonado" in the text column, as well as the number of times his Id appears in the participants_1_athlete_id column. In particular, he should have been credited with assists on jumpers from Graham Ike with 17:16 remaining in the 1st half and 1:15 remaining in the 2nd half. Overall, Maldonado only gets credited with 190 assists in the pbp file, but he should have 197 assists for the season so far (following Wyoming's win over UNLV in the MWC conference tournament, their last game in the dataset).

Add cfbd Drives data

cfbd has drive-level data through cfbd.DrivesApi that would be a nice addition

Men's BBall Realtime Data

Does data from the mbb_loaders.py file get pulled in realtime (such as load_mbb_player_boxscore and load_mbb_team_boxscore functions)? I see it's pulling from CSVs, but I don't see any data from the 2024 season yet. Is there another function I should use?

add cfbd coaches data

CFBD has coach data through cfbd.CoachesApi, which would make a nice addition to this package

error with load_nfl_depth_charts

When running the following code:

sdv.nfl.load_nfl_depth_charts(list(range(2001, 2024)))

gives the following error when trying to load 2020 data:

ShapeError: unable to vstack, dtypes for column "season" don't match: f64 and i32

Add a progress bar to load_cfb_pbp

Obviously, this function takes quite a while to bring in, long enough that it leaves the user uncertain if it's still working properly or if something has frozen. A progress bar would be a nice touch to let the user know that it's still going well, and also give a ballpark on how much longer he'll be waiting.

issues with load_nfl_schedule

Mutltiple issues trying to run

sdv.nfl.load_nfl_schedule(list(range(1999, 2024)))

from 1999 to 2000 we have the first error:

ShapeError: unable to vstack, dtypes for column "gametime" don't match: f64andstr``

Between 2020 and 2021 we have:

ShapeError: unable to vstack, dtypes for column "away_score" don't match: i32andi64``

2023 and beyond give us:

HTTPError: HTTP Error 404: Not Found

404 error in games

sportsdataverse.cfb.load_cfb_schedule gives a 404 when trying to get games from 2022, should there be a check and then this just returns nothing?

error with load_nfl_injuries

When running:

'sdv.nfl.load_nfl_injuries(list(range(2009, 2024)))'

the following error is received trying to include the 2011 data

ShapeError: unable to vstack, dtypes for column "season" don't match: f64 and i32

issues with load_nfl_pbp

Erros when running

sdv.nfl.load_nfl_pbp(list(range(1999, 2024)))

between 2005 and 2006 we get:

ShapeError: unable to vstack, dtypes for column "xyac_median_yardage" don't match: f64andi32``

Cannot pull pbp data for Washington Football Team post 2018

Since there is no mascot for the Washington Football Team post 2018, its produces the following key error:

KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17088/3575107685.py in
----> 1 nfl = sportsdataverse.nfl.NFLPlayProcess("401326425").espn_nfl_pbp()

~\Anaconda3\lib\site-packages\sportsdataverse\nfl\nfl_pbp.py in espn_nfl_pbp(self)
112 )
113 awayTeamMascot = str(
--> 114 pbp_txt["header"]["competitions"][0]["competitors"][1]["team"]["name"]
115 )
116 homeTeamName = str(

KeyError: 'name'

Missing cfb teams

There are teams who appear in cfb games who aren't in teams, i.e. no matching team id.

111 Northeastern
2275 Hofstra
2676 Virginia Union
1000899 Edward Waters

Games between 2002 and 2009, the teams exist in the cfbd teams list.

error with load_cfb_schedule

Running the code:

cfb_schedule = sdv.cfb.load_cfb_schedule(range(2003, 2024))

gives dtype errors such as

ShapeError: unable to vstack, dtypes for column "home_post_win_prob" don't match: bool and str for 2002-03, 2003-04, 2009-10, 2010-11, etc

MLB Related Functions Not Working & Possible Change in MLB API

Using an endpoint constructed from this function:

searchURL = "http://lookup-service-prod.mlb.com/json/named.org_game_type_date_info.bam?current_sw='Y'&sport_code='mlb'&"

https://lookup-service-prod.mlb.com/json/named.org_game_type_date_info.bam?current_sw=%27Y%27&sport_code=%27mlb%27&game_type=%27R%27&season=%272022%27

It currently returns the following:

{"org_game_type_date_info":{"copyRight":" NOTICE: This file is no longer actively supported. Please use the MLB Stats API (http://statsapi.mlb.com/docs/) as an alternative. Copyright 2023 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt ","queryResults":{"totalSize":"0","created":"2023-07-11T20:12:30"}}}

Seperately I did have some import errors on the MLB module but I think these are superseded by the issue above:

import sportsdataverse as sdv

sdv.mlb.mlbam_games.mlbam_schedule(2022, gameType='R')

>> AttributeError: module 'sportsdataverse' has no attribute 'mlb'

ncaa march madness tournament scores not pulling

First of all, this is an amazing package, and thank you for all of your hard work. It seems that the NCAA tournament games are missing from the schedule, play by play, and team boxscores.

Screen Shot 2022-03-08 at 11 04 57 AM

At some point, I can try my hand at a pull request, but I thought I'd open an issue first.

Cannot pull 2021 data from load_cfb_pbp

Trying to access pbp data for the 2021 season leads to a HTTPError. Is there any way to make sure the data is up-to-date? I would love to be able to utilize this package during the season as well. Thank you!

404 Error in teams

When calling cfb_teams_frame = sportsdataverse.cfb.load_cfb_team_info(seasons=range(2002, 2022)) I get a 404. It works with any other years. Same error comes if I set "seasons=[2021]".

error with load_nfl_weekly_rosters

sdv.nfl.load_nfl_weekly_rosters(list(range(2002, 2024))) gives an error when it gets to 2015:

ShapeError: unable to vstack, dtypes for column "jersey_number" don't match: str and i32

A look at the values of jersey_number in 2015 doesn't show any weird values.

Error for load_cfb_rosters

Running:

sdv.cfb.load_cfb_rosters(range(2014, 2024))

get the error

ShapeError: unable to vstack, dtypes for column "season" don't match: i32 and f64

with 2023 data

Non-matching names in CFB player data

When you pull cfb_rosters you can't link those players back to there teams in some cases because the school ID isn't used, just the name, and the names sometimes don't match any of the variations in the teams data, even if you join both the cfbd and espn names

The non-matching teams are:

{'Louisiana Monroe', 'St Francis (PA)', 'Sam Houston State', 'Southeastern Louisiana', 'Connecticut', 'UT San Antonio', 'Prairie View', 'Southern Mississippi', 'Presbyterian College'}

error with load_cfb_pbp

when running:

sdv.cfb.load_cfb_pbp(list(range(2003, 2024)))

get error:

ShapeError: unable to append to a dataframe of width 366 with a dataframe of width 364

ShapeError: unable to vstack, column names don't match: "start.team.id" and "start.downDistanceText"

One of these errors occurs (with minor variation on the df widths of column names) for everypair of consecutive of seasons through at least 2017 with the exception of 2011-2012

2022 CFB PBP data source?

The load_cfb_pbp function in this package and cfbfastR pull from different sources
cfbfastR pulls PBP data from https://raw.githubusercontent.com/sportsdataverse/cfbfastR-data/main/data/rds/pbp_players_pos_",seasons,".rds". This package pulls PBP data from https://raw.githubusercontent.com/sportsdataverse/cfbfastR-data/main/pbp/parquet/play_by_play_{season}.parquet. There is a parquet file in sportsdataverse/cfbfastR-data/main/data/parquet, but it seems like the data pipeline might have stopped during the 2022 season. Is the PBP folder the expected data source? If so, how does it get updated?

Trying to explore this package and understand what it can and can't do versus cfbfastR.

CFB games not pulling bowls?

I don't think sportsdataverse.cfb.load_cfb_schedule is properly pulling bowl games. When I try to match game_id in plays to the parent games I get missing ids

Ex.
400852668 2015 New Mexico Bowl
400876038 2016 New Mexico Bowl

Bring in all games from cfbd

sportsdataverse.cfb.load_cfb_schedule only brings in games as far back as 2002, but cfbd actually has games back to ~1860, I don't see a reason not to bring them all in, even if there's no attendant pxp?

Bring in all teams from cfbd list

CFBD has ~1700 teams covering every game in their games db, currently sportsdataverse.cfb.load_cfb_team_info only brings in a couple hundred. I don't see any reason not to bring them all in?

load_nfl_rosters pre- and post-2002 columns not matching

Getting an error from:

sdv.nfl.load_nfl_rosters(list(range(1999, 2005)))

Leading to the error:

`ShapeError: unable to append to a dataframe of width 30 with a dataframe of width 36

The columns that don't exist in the earlier seasons are:

ngs_position week game_type status_description_abbr football_name draft_number

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.