Coder Social home page Coder Social logo

baseball.computer's People

Contributors

droher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

sushiinyourface

baseball.computer's Issues

Create separate table/subquery to properly handle pickoffs

Nuances to handle:

  • Pickoffs should be credited to one and only one of the pitcher/catcher (as opposed to SB/CS), which requires correlating assist/putout data on the play
  • Rare/erroneous/missing details: pitcher unassisted, no fielding credit, or non-pitcher-catcher getting first assist (likely erroneous or P/C implied in pitch sequence). Can find with rg -e "PO\d\([3-8]"
  • Notify Retrosheet of box score generation error: https://www.baseball-reference.com/boxes/SEA/SEA201107170.shtml ("Pickoffs: Blake Beavan (0; Adrian Beltre, 2nd base)")

Pitching IP rate stats divided by (outs_recorded * 3) instead of (outs_recorded / 3)

I haven't been able to find the root cause of the problem, but I've stumbled across some columns of incorrect data. So far, the most notable example of this I can find is ERA in the metrics_player_career_pitching table. I've provided a quick example below with Terry Adams, but from what I can tell, many (if not all) pitchers have incorrect stats for multiple columns.

Querying some basic data in metrics_player_career_pitching for Terry, this is what gets returned:
image

Double-checking with baseballreference, Terry did not have a career ERA of 0.64, it was actually 4.17. His strikeouts/9 are also wrong, with an actual value of 7.2, as well as his RA(4.64) and his WHIP (1.46). Notably, all the counting stats seem to be accurate as far as I can tell, but many of the rate stats (or in general, numbers stored as doubles) are off.

Running list of tables/calcs

  • fielder game
  • game results
  • batter season (all seasons should have totals and advanced averages)
  • pitcher season
  • fielder season
  • daily standings
  • fielding components:
    • heuristic batted ball type inference
    • fractional fielder hit/out responsibility
    • park factors
    • outfield arms
    • catcher arms (sb, pickoff, pb/wp)
    • fielding runs algorithm
  • WAR Calculations
  • count table (lower prio)
  • LOB

Carry enums over from Rust once duckdb has ALTER TYPE for custom types

  enum_lookups:
    side: game.game_team
    doubleheader_status: game.game
    time_of_day: game.game
    game_type: game.game
    sky: game.game
    field_condition: game.game
    precipitation: game.game
    wind_direction: game.game
    entered_game_as: game.game_lineup_appearance
    frame: event.event
    baserunner: event.event_starting_base_state
    baserunning_play_type: event.event_baserunning_play
    contact: event.event_batted_ball_info
    general_location: event.event_batted_ball_info
    depth: event.event_batted_ball_info
    angle: event.event_batted_ball_info
    strength: event.event_batted_ball_info
    fielding_play: event.event_fielding_play
    flag: event.event_flag
    sequence_item: event.event_pitch
  enum_vals:
    base: [First, Second, Third, Home]

{% macro add_enums() %}
  {% set sql%}
    {% for enum, table in var("enum_lookups").items() %}
      CREATE TYPE {{ enum }} AS ENUM (SELECT DISTINCT {{ enum }} FROM {{ table }});
    {%- endfor %}
    {% for node in graph.sources.values() %}
      {% for col_name, col_data in node.columns.items() if col_data.get("data_type") %}
        ALTER TABLE {{ node.schema }}.{{ node.name }} ALTER COLUMN {{ col_name }} TYPE {{ col_data.data_type }};
      {%- endfor %}
    {% endfor %}
  {% endset %}
  {% do log(sql, info=True)%}
  {% do run_query(sql) %}
{% endmacro %}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.