Coder Social home page Coder Social logo

benjamincrom / baseball Goto Github PK

View Code? Open in Web Editor NEW
76.0 10.0 15.0 11.28 MB

Library to download, analyze, and visualize events in Major League Baseball games

Home Page: http://livebaseballscorecards.com

License: MIT License

Python 20.07% HTML 79.73% Shell 0.20%
baseball baseball-statistics baseball-analysis-packages baseball-stats

baseball's Introduction

Table of Contents

Baseball

This package fetches and parses event data for Major League Baseball games. Game objects generated via the _from_url methods pull data from MLB endpoints where events are published within about 30 seconds of occurring. This XML/JSON source data zip file contains event data from MLB games 1974 - 2020.

Installing from pypi

pip3 install baseball

Installing from source

git clone [email protected]:benjamincrom/baseball.git
cd baseball/
python3 setup.py install

Fetch individual MLB game

  • get_game_from_url(date_str, away_code, home_code, game_number)

Fetch an object which contains metadata and events for a single MLB game.

import baseball
game_id, game = baseball.get_game_from_url('2017-11-1', 'HOU', 'LAD', 1)
game_dict = game._asdict()
game_json_str = game.json()

Write scorecard as SVG image:

with open(game_id + '.svg', 'w') as fh:
    fh.write(game.get_svg_str())

2017-11-01-HOU-LAD-1.svg svg

Game Class Structure

Game

  • away_batter_box_score_dict
  • away_pitcher_box_score_dict
  • away_team (Team)
  • away_team_stats
  • start_datetime
  • expected_start_datetime
  • game_date_str
  • home_batter_box_score_dict
  • home_pitcher_box_score_dict
  • home_team (Team)
  • home_team_stats
  • inning_list (Inning list)
  • end_datetime
  • location
  • attendance
  • weather
  • temp
  • timezone_str
  • is_postponed
  • is_suspended
  • is_doubleheader
  • is_today
  • get_svg_str()
  • json()
  • _asdict()

Team

  • abbreviation
  • batting_order_list_list (list of nine PlayerAppearance lists)
  • name
  • pitcher_list (PlayerAppearance list)
  • player_id_dict
  • player_last_name_dict
  • player_name_dict
  • _asdict()

Inning

  • bottom_half_appearance_list (PlateAppearance list)
  • bottom_half_inning_stats
  • top_half_appearance_list (PlateAppearance list)
  • top_half_inning_stats
  • _asdict()

PlateAppearance

  • start_datetime
  • end_datetime
  • batter (Player)
  • batting_team (Team)
  • error_str
  • event_list (list of Pitch, Pickoff, RunnerAdvance, Substitution, Switch objects)
  • got_on_base
  • hit_location
  • inning_outs
  • out_runners_list (Player list)
  • pitcher (Player)
  • plate_appearance_description
  • plate_appearance_summary
  • runners_batted_in_list (Player list)
  • scorecard_summary
  • scoring_runners_list (Player list)
  • _asdict()

Player

  • era
  • first_name
  • last_name
  • mlb_id
  • number
  • obp
  • slg
  • _asdict()

PlayerAppearance

  • start_inning_batter_num
  • start_inning_half
  • start_inning_num
  • end_inning_batter_num
  • end_inning_half
  • end_inning_num
  • pitcher_credit_code
  • player_obj (Player)
  • position
  • _asdict()

Pitch

  • pitch_datetime
  • pitch_description
  • pitch_position
  • pitch_speed
  • pitch_type
  • _asdict()

Pickoff

  • pickoff_description
  • pickoff_base
  • pickoff_was_successful
  • _asdict()

RunnerAdvance

  • runner_advance_datetime
  • run_description
  • runner (Player)
  • start_base
  • end_base
  • runner_scored
  • run_earned
  • is_rbi
  • _asdict()

Substitution

  • substitution_datetime
  • incoming_player (Player)
  • outgoing_player (Player)
  • batting_order
  • position
  • _asdict()

Switch

  • switch_datetime
  • player (Player)
  • old_position_num
  • new_position_num
  • new_batting_order
  • _asdict()

Analyze a game: 2017 World Series - Game 7

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

import baseball

%matplotlib inline

game_id, game = baseball.get_game_from_url('11-1-2017', 'HOU', 'LAD', 1)

pitch_tuple_list = []
for inning in game.inning_list:
    for appearance in inning.top_half_appearance_list:
        for event in appearance.event_list:
            if isinstance(event, baseball.Pitch):
                pitch_tuple_list.append(
                    (str(appearance.pitcher), 
                     event.pitch_description,
                     event.pitch_position,
                     event.pitch_speed,
                     event.pitch_type)
                )

data = pd.DataFrame(data=pitch_tuple_list, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])
data.head()
Pitcher Pitch Description Pitch Coordinate Pitch Speed Pitch Type
0 21 Yu Darvish Ball (155.47, 160.83) 96.0 FF
1 21 Yu Darvish Called Strike (107.0, 171.09) 83.9 FC
2 21 Yu Darvish In play, no out (115.36, 183.1) 83.9 SL
3 21 Yu Darvish In play, run(s) (80.06, 168.03) 96.6 FF
4 21 Yu Darvish Ball (54.1, 216.52) 84.6 SL
data['Pitcher'].value_counts().plot.bar()

png

for pitcher in data['Pitcher'].unique():
    plt.ylim(0, 125)
    plt.xlim(0, 250)
    bx = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if 'Ball' in x[1]]
    by = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if 'Ball' in x[1]]
    cx = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if 'Called Strike' in x[1]]
    cy = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if 'Called Strike' in x[1]]
    ox = [250 - x[2][0] for x in pitch_tuple_list if x[0] == pitcher if ('Ball' not in x[1] and 'Called Strike' not in x[1])]
    oy = [250 - x[2][1] for x in pitch_tuple_list if x[0] == pitcher if ('Ball' not in x[1] and 'Called Strike' not in x[1])]
    b = plt.scatter(bx, by, c='b')
    c = plt.scatter(cx, cy, c='r')
    o = plt.scatter(ox, oy, c='g')

    plt.legend((b, c, o),
               ('Ball', 'Called Strike', 'Other'),
               scatterpoints=1,
               loc='upper right',
               ncol=1,
               fontsize=8)

    plt.title(pitcher)
    plt.show()

png

png

png

png

png

plt.axis('equal')
data['Pitch Description'].value_counts().plot(kind='pie', radius=1.5, autopct='%1.0f%%', pctdistance=1.1, labeldistance=1.2)

png

data.plot.kde()

png

fig, ax = plt.subplots()
ax.set_xlim(50, 120)
for pitcher in data['Pitcher'].unique():
    s = data[data['Pitcher'] == pitcher]['Pitch Speed']
    s.plot.kde(ax=ax, label=pitcher)

ax.legend()

png

fig, ax = plt.subplots()
ax.set_xlim(50, 120)
for desc in data['Pitch Type'].unique():
    s = data[data['Pitch Type'] == desc]['Pitch Speed']
    s.plot.kde(ax=ax, label=desc)

ax.legend()

png

fig, ax = plt.subplots(figsize=(15,7))
data.groupby(['Pitcher', 'Pitch Description']).size().unstack().plot.bar(ax=ax)

png

Analyze a player's season: R.A. Dickey - 2017

game_list_2017 = baseball.get_game_list_from_file_range('1-1-2017', '12-31-2017', '/Users/benjamincrom/repos/livebaseballscorecards-artifacts/baseball_files')

pitch_tuple_list_2 = []
for game_id, game in game_list_2017:
    if game.home_team.name == 'Atlanta Braves' or game.away_team.name == 'Atlanta Braves':
        for inning in game.inning_list:
            for appearance in (inning.top_half_appearance_list +
                               (inning.bottom_half_appearance_list or [])):
                if 'Dickey' in str(appearance.pitcher):
                    for event in appearance.event_list:
                        if isinstance(event, baseball.Pitch):
                            pitch_tuple_list_2.append(
                                (str(appearance.pitcher), 
                                 event.pitch_description,
                                 event.pitch_position,
                                 event.pitch_speed,
                                 event.pitch_type)
                            )

df = pd.DataFrame(data=pitch_tuple_list_2, columns=['Pitcher', 'Pitch Description', 'Pitch Coordinate', 'Pitch Speed', 'Pitch Type'])
df['Pitch Type'].value_counts().plot.bar()

png

plt.axis('equal')
df['Pitch Description'].value_counts().plot(kind='pie', radius=2, autopct='%1.0f%%', pctdistance=1.1, labeldistance=1.2)
plt.ylabel('')
plt.show()

png

df.dropna(inplace=True)
ax.set_xlim(50, 100)
df.plot.kde()
ax.legend()

png

fig, ax = plt.subplots()
ax.set_xlim(50, 100)
for desc in df['Pitch Type'].unique():
    if desc != 'PO':
        s = df[df['Pitch Type'] == desc]['Pitch Speed']
        s.plot.kde(ax=ax, label=desc)

ax.legend()

png

Analyze a lineup of pitchers: Atlanta Braves - 2017 Regular Season

import datetime
import dateutil.parser
import pytz
pitch_tuple_list_3 = []
for game_id, game in game_list_2017:
    if game.home_team.name == 'Atlanta Braves' and dateutil.parser.parse(game.game_date_str) > datetime.datetime(2017, 3, 31):
        for inning in game.inning_list:
            for appearance in inning.top_half_appearance_list:
                pitch_tuple_list_3.append(
                    (str(appearance.pitcher),
                     str(appearance.batter),
                     len(appearance.out_runners_list),
                     len(appearance.scoring_runners_list),
                     len(appearance.runners_batted_in_list),
                     appearance.scorecard_summary,
                     appearance.got_on_base,
                     appearance.plate_appearance_summary,
                     appearance.plate_appearance_description,
                     appearance.error_str,
                     appearance.inning_outs)
                )
    if game.away_team.name == 'Atlanta Braves' and dateutil.parser.parse(game.game_date_str) > datetime.datetime(2017, 3, 31):
        for inning in game.inning_list:
            if inning.bottom_half_appearance_list:
                for appearance in inning.bottom_half_appearance_list:
                    pitch_tuple_list_3.append(
                        (str(appearance.pitcher),
                         str(appearance.batter),
                         len(appearance.out_runners_list),
                         len(appearance.scoring_runners_list),
                         len(appearance.runners_batted_in_list),
                         appearance.scorecard_summary,
                         appearance.got_on_base,
                         appearance.plate_appearance_summary,
                         appearance.plate_appearance_description,
                         appearance.error_str,
                         appearance.inning_outs)
                    )

df3 = pd.DataFrame(data=pitch_tuple_list_3, columns=['Pitcher',
                                                     'Batter',
                                                     'Out Runners',
                                                     'Scoring Runners',
                                                     'RBIs',
                                                     'Scorecard',
                                                     'On-base?',
                                                     'Plate Summary',
                                                     'Plate Description',
                                                     'Error',
                                                     'Inning Outs'])

for pitcher in df3['Pitcher'].unique():
    summary = df3[df3['Pitcher'] == pitcher]['Plate Summary']
    s = summary.value_counts(sort=False)
    if len(summary) > 400:
        fig, ax = plt.subplots()
        ax.set_ylim(0, 250)
        s.plot.bar()
        plt.title(pitcher)
        plt.show()

png

png

png

png

png

x = []
for pitcher in df3['Pitcher'].unique():
    #f = df3[df3['Pitcher'] == pitcher]['On-base?'].value_counts()[0]
    s = df3[df3['Pitcher'] == pitcher]['On-base?'].value_counts()
    if len(s) == 2:
        f = s[0]
        t = s[1]
        x.append((str(pitcher), f, t))

df4 = pd.DataFrame(data=x, columns=['Pitcher',
                                    'Did not get on base',
                                    'Got on base'])

df4.index = df4['Pitcher']
df4.sort_values(by=['Got on base']).nlargest(10, 'Did not get on base').plot.bar()

png

baseball's People

Contributors

benjamincrom avatar slegge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

baseball's Issues

Error trying to write scorecard to SVG

In the first repo example, the following code works fine and I can print the JSON string:

import baseball
game_id, game = baseball.get_game_from_url('2017-11-1', 'HOU', 'LAD', 1)
game_dict = game._asdict()
game_json_str = game.json()

However, following the next block of code to write the JSON to a SVG, I get the following error:

Traceback (most recent call last):
  File "example.py", line 8, in <module>
    fh.write(game.get_svg_str())
  File "~/baseball/baseball/example.py", line 427, in get_svg_str
    return get_game_svg_str(self)
  File "~/baseball/baseball/generate_svg.py", line 2072, in get_game_svg_str
    assemble_box_content_dict(game),
  File "~/baseball/baseball/generate_svg.py", line 1982, in assemble_box_content_dict
    svg_content_list = get_svg_content_list(game)
  File "~/baseball/baseball/generate_svg.py", line 1290, in get_svg_content_list
    get_runners_svg(plate_appearance),
  File "~/baseball/baseball/generate_svg.py", line 902, in get_runners_svg
    is_forceout_desc = ('Forceout' in event.run_description or
AttributeError: 'Pitch' object has no attribute 'run_description' 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.