Coder Social home page Coder Social logo

baseball.computer.rs's Introduction

baseball.computer.rs's People

Contributors

droher avatar dependabot[bot] avatar

Stargazers

Galen O’Hanlon avatar

Watchers

 avatar  avatar

baseball.computer.rs's Issues

Data completeness tables

Data points

  • Pitch sequence
    • Pitch total
    • Balls/strikes thrown
    • Balls/strikes count at event
    • Called vs swinging strikes
  • Fielding
    • Putouts
    • Assists
    • Errors
    • Fielding opportunities (using hit-to-fielder) (covered in hit location)
  • Hit location
    • GB/FB
    • Hit to fielder
    • Hit location spec
  • PBP Account
    • PA covered
  • Box score account
    • Basically all box score stats

Granularities (all split into batting, pitching, fielding)

  • Season
  • Game
  • Team Season
  • Player
  • Player Season
  • Team Game
  • Player Game
  • Event

Create numeric keys for PKs and enums

Duckdb is running out of memory when trying to perform operations on tables that have multiple rows per event (particularly fielding. A game-event-baserunner key might have 20 bytes as a string, but only 4 bytes are needed to represent it uniquely.

  • Assign numeric IDs to enums that make it to outputs
    • Alternative is to make them enums inside duckdb but that seems not worth it/harder
  • Add global 4-byte identifiers for games, events, baserunners
    • This one might actually make more sense to do in dbt as long as there's a consistent translation

Update hit location codes

There are some undocumented hit location codes that are not getting parsed correctly - "L", "M", and "R" as further specificity in a numerical range.

Might also want to build a mapping of the codes to polar coordinates (maybe worth saving for the inference step).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.