burntsushi / nfldb Goto Github PK

A library to manage and update NFL data in a relational database.

License: The Unlicense

Makefile 0.54% Python 99.28% Shell 0.16% Vim Script 0.02%

nfldb's Introduction

THIS PROJECT IS UNMAINTAINED.

nfldb is a relational database bundled with a Python module to quickly and conveniently query and update the database with data from active games. Data is imported from nflgame, which in turn gets its data from a JSON feed on NFL.com's live GameCenter pages. This data includes, but is not limited to, game schedules, scores, rosters and play-by-play data for every preseason, regular season and postseason game dating back to 2009.

It can also be used with nflvid to search for and watch video of plays. Please see the nfldb wiki for more details on how to get that working.

Here is a small teaser that shows how to use nfldb to find the top five passers in the 2012 regular season:

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year=2012, season_type='Regular')
for pp in q.sort('passing_yds').limit(5).as_aggregate():
    print pp.player, pp.passing_yds

And the output is:

[andrew@Liger ~] python2 top-five.py
Drew Brees (NO, QB) 5177
Matthew Stafford (DET, QB) 4965
Tony Romo (DAL, QB) 4903
Tom Brady (NE, QB) 4799
Matt Ryan (ATL, QB) 4719

Documentation and getting help

nfldb has comprehensive API documentation. Tutorials, more examples and a description of the data model can be found on the nfldb wiki.

If you need any help or have found a bug, please open a new issue on nfldb's issue tracker or join us at our IRC channel #nflgame on FreeNode.

Installation and dependencies

nfldb depends on the following Python packages available in PyPI: nflgame, psycopg2, pytz and enum34. nfldb also needs PostgreSQL installed with an available empty database.

I've only tested nfldb with Python 2.7 on a Linux system. In theory, nfldb should be able to work on Windows and Mac systems as long as you can get PostgreSQL running. It is not Python 3 compatible (yet, mostly because of the nflgame dependency).

Please see the installation guide on the nfldb wiki for instructions on how to setup nfldb.

Entity-relationship diagram

Here's a condensed version that excludes play and player statistics:

There is also a full PDF version that includes every column in the database.

The nfldb wiki has a description of the data model.

The most recent version of the nfldb PostgreSQL database is available here: http://burntsushi.net/stuff/nfldb/nfldb.sql.zip.

nfldb's People

Contributors

Stargazers

Watchers

Forkers

teamfball daybreaker grlee webflint mercuryrising brianwu02 minkovich oooshiny rogerschlachter atesty leonardb alphamanimal ersherr andrewgjohnson-forks tven7 dmitry13 morefromalan micahstone20 rfellows jcapron slucey88 ragnardanneskjold atestu catufunwa jdm5056 ndelaneybusch twobitint tnclark8012 douglas-larocca billybebop57 mwlwm nkhuyu electromonkey ecastillomon potentiametrics maddockz ojwise polynesian surye akatie rainstar82 anvargo benjagaiden jonpiffle bparafina nivertech codeaudit mcd0well jxprime verdimrc joshadhd switchflip33 tkpanther joenophoto chrisrammy christewart ochawkeye patec alexisgo tulimaki jcpatte2 tysmith17 rmehta1987 maracuja sandy4321 mnsteinbe bhaktaroshan owenb132 dan-drews priestd09 jtdean87 nehz cafeconvega jtroybaker wbclark xuli1 asknudo mattbeth trickorita maxssage adsznzhang jimhagan fenkod johnsonjr modulexcite nateyh1769 hpadmanabhan lipho donhescht brutalsimplicity jkronk phillynch7 patrick-praher noonan woshilaozhang snake-plissken onebitbrain jasonlttl andr3w321 parix

nfldb's Issues

Errors while updating for week 12 (2013)

Here's the full output of nfldb-update:

STARTING NFLDB UPDATE AT 2013-11-26 00:09:01.645936
Connecting to nfldb... done.
Setting timezone to UTC... done.
Updating player JSON database... (last update was 2013-11-17 23:03:18.200011+00:00)
Loading games for REG 2013 week 12
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/site-packages/nflgame/update_players.py", line 480, in <module>
    run()
  File "/usr/local/lib/python2.7/site-packages/nflgame/update_players.py", line 367, in run
    players = dict(players_from_games(metas, games))
  File "/usr/local/lib/python2.7/site-packages/nflgame/update_players.py", line 245, in players_from_games
    for d in g.drives:
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 407, in __getattr__
    self.__drives = _json_drives(self, self.home, self.data['drives'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 675, in _json_drives
    d = Drive(game, i, home_team, data[str(drive_num)])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 520, in __init__
    self.__plays = _json_plays(self, data['plays'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 698, in _json_plays
    plays.append(Play(drive, playid, data[playid]))
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 598, in __init__
    self.events = _json_play_events(data['players'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 741, in _json_play_events
    statvals = nflgame.statmap.values(info['statId'], info['yards'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/statmap.py", line 54, in values
    yards = int(yards)
TypeError: int() argument must be a string or a number, not 'NoneType'
`/usr/local/opt/python/bin/python2.7 -m nflgame.update_players --no-block` failed (exit status 1)
Locking write access to tables... done.
Updating season phase, year and week... done.
Bulk inserting data for 17 games...
    Sending batch of data to database.
Traceback (most recent call last):
  File "/usr/local/bin/nfldb-update", line 413, in <module>
    doit()
  File "/usr/local/bin/nfldb-update", line 403, in doit
    update(db, player_interval=args.player_interval)
  File "/usr/local/bin/nfldb-update", line 339, in update
    bulk_insert_game_data(cursor, scheduled)
  File "/usr/local/bin/nfldb-update", line 162, in bulk_insert_game_data
    g = game_from_id(cursor, gsis_id)
  File "/usr/local/bin/nfldb-update", line 55, in game_from_id
    return nfldb.Game._from_nflgame(cursor.connection, g)
  File "/usr/local/lib/python2.7/site-packages/nfldb/types.py", line 2048, in _from_nflgame
    for drive in g.drives:
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 407, in __getattr__
    self.__drives = _json_drives(self, self.home, self.data['drives'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 675, in _json_drives
    d = Drive(game, i, home_team, data[str(drive_num)])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 520, in __init__
    self.__plays = _json_plays(self, data['plays'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 698, in _json_plays
    plays.append(Play(drive, playid, data[playid]))
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 598, in __init__
    self.events = _json_play_events(data['players'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/game.py", line 741, in _json_play_events
    statvals = nflgame.statmap.values(info['statId'], info['yards'])
  File "/usr/local/lib/python2.7/site-packages/nflgame/statmap.py", line 54, in values
    yards = int(yards)
TypeError: int() argument must be a string or a number, not 'NoneType'

I'm running version 0.0.14 (just did a pip install -U nfldb. I've updated in the past without any issue but it didn't work today. Am I doing something wrong?

Manually updating the DB

Hi,

First off, great work with this, I am amazed on how great it works :)

I have a a little question. What will happen when I use the auto update script if I have manually updated the DB. Will it overwrite the changes I do?

I am asking since I would like to somehow save data from pre 2009 in my DB, so looking at options to make this happen.

Tobias

Problem with city field in team table for New York teams

There is a problem in the team table for the Jets and the Giants. Their city field is set to Giants or Jets and the name field contains their full name:

team_id |     city      |      name
---------+---------------+-----------------
 ARI     | Arizona       | Cardinals
 ...
 NYG     | Giants        | New York Giants
 NYJ     | Jets          | New York Jets
 OAK     | Oakland       | Raiders

AssertionError: The key 'team' does not exist for entity 'Game'.

Hi, it's me again.

I wasn't scared off. I am trying to work my way through the wiki page.

Following the examples from "an introduction to the query interface" I type in this:

import nfldb
db = nfldb.connect()

q = nfldb.Query(db)
q.game(season_year=2013, season_type='Regular', team='NE', week=1)
for d in q.as_drives():
print d

And this error returns:

Traceback (most recent call last):
File "C:\Users\Rugh\Desktop\nfldb\test.py", line 5, in
q.game(season_year=2013, season_type='Regular', team='NE', week=1)
File "C:\Python27\lib\site-packages\nfldb\query.py", line 586, in game
_append_conds(self._default_cond, types.Game, kw)
File "C:\Python27\lib\site-packages\nfldb\query.py", line 202, in _append_conds
% (kbare, entity.name)
AssertionError: The key 'team' does not exist for entity 'Game'.

Each time I try to use 'team' in q.game I get this message.

What am I missing?

nfldb & nflgame data mismatch

I'm finding some instances where nfldb data does not match what I would expect based on last years nflgame data. For example:

from __future__ import print_function
y, w = 2013, 7

import nflgame
games = nflgame.games(year=y, week=w, kind='REG')
players = nflgame.combine_game_stats(games)
for x in players:
    if x.name == 'R.Gould' or x.name == 'G.Gano':
        print(x.name, getattr(x, 'kicking_xpmade'))

print('-'*79)

import nfldb
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=y, season_type='Regular', week=w)
q.player(full_name='Graham Gano')
for pp in q.as_aggregate():
    print(pp.player.gsis_name, getattr(pp, 'kicking_xpmade'))

q = nfldb.Query(db)
q.game(season_year=y, season_type='Regular', week=w)
q.player(full_name='Robbie Gould')
for pp in q.as_aggregate():
    print(pp.player.gsis_name, getattr(pp, 'kicking_xpmade'))

Produces the following:

G.Gano 3
R.Gould 5
-------------------------------------------------------------------------------
G.Gano 2
R.Gould 4

Where are the missing extra points? Am I simply using incorrect criteria in nfldb queries?

Link to images

I'm not sure if this is data available in the database (or if there is even a 'profile' table). But I notice that all the players have a link to their webpage (profile_url)(http://www.nfl.com/player/playername/playerid/profile.
I wanted to pull in the image that's connected for all players (http://static.nfl.com/static/content/public/static/img/getty/headshot/K/A/E/KAE371576.jpg) (For Colin Kaepernick, for example), I'm curious if there is something that isn't currently brought in that would have that information. I was hoping to use this in my dashboard

psycopg2 closing postgres connection

pretty weird issue. i have no problem running nfldb-update on my local machine but when trying to run on new virtual machine(if it means anything, this is VPS) i get this:

here's the pip freeze:
argparse==1.2.1
beautifulsoup4==4.3.2
enum34==0.9.19
httplib2==0.8
nfldb==0.0.11
nflgame==1.1.21
psycopg2==2.5.1
pytz==2013b
wsgiref==0.1.2

and here's the error:

STARTING NFLDB UPDATE AT 2013-10-28 10:42:57.612357
Connecting to nfldb... done.
Setting timezone to UTC... done.
Locking write access to tables... done.
Updating season phase, year and week... done.
Bulk inserting data for 14 games...
Sending batch of data to database.
Traceback (most recent call last):
File "/home/ocean/fantasyfootball/.env/bin/nfldb-update", line 408, in
doit()
File "/home/ocean/fantasyfootball/.env/bin/nfldb-update", line 398, in doit
update(db, player_interval=args.player_interval)
File "/home/ocean/fantasyfootball/.env/bin/nfldb-update", line 344, in update
log('done.')
File "/home/ocean/fantasyfootball/.env/local/lib/python2.7/site-packages/nfldb/db.py", line 295, in exit
self.__conn.rollback()
psycopg2.InterfaceError: connection already closed

pg_hba.conf and config.ini are the same

Old question, new scheme.

""How do I programmatically resolve where a player will be before the game takes place? Additionally, a player’s injury status pre game is another hurdle.""

We discussed this topic at length five or six months ago. Any chance the DB will afford a solution?

penalty_yds is always positive (in nflgame)

I found a data issue that you may/may not be able to fix, depending upon the pull from the feed. The play.penalty_yds data is unsigned, therefore, it does not accurately reflect the yardage gained/lost. All data is a positive integer value.

For example, if Seattle was in possession and had an offensive holding penalty, one would expect the play.penalty_yds to contain -10, yet the actual data contained currently would be 10. If Denver was in possession and there was a defensive encroachment penalty, one would expect the play.penalty_yds to contain 5.

Without properly signed data, it forces a string match within play.description for 'PENALTY on {team}', which may return inaccurate results in the event there were multiple penalties on the play (For example, gsis_id=2009092011, drive_id=4, play_id=836).

The column drive.penalty_yards is accurately signed, but is a composite of all plays within that drive, which does not break down the penalty yards on each particular team, let alone individual plays.

Obviously, this issue is dependent upon what data can be extracted from the data feed, so a fix may not be possible... but, it would be really nice to have! Thanks in advance for taking a look at the issue!

Trying to install nfldb postgresql database on windows

I have been using the python version of nfldb for some time now. I am trying to install nfldb postgresql database on windows and have gotten to step 10 of the windows installation instructions. I had previously done the instructions under a new nfldb user but when i created that user in my command prompt all roles were set to No so I was not able to create databases or anything and was getting an access is denied message when running step 10 of the installation for windows. I went through the entire installation steps again but built the database under super user postgres. When I get to step 10 to import the nfldb.sql file into postgresql database I get the same access is denied message. I am under the superuser role so unsure why I would be getting this message again? Any help would be greatly appreciated.

Game data for 2013112800 is mostly missing

Most of the data for the DET-GB game is missing from the nfldb download zip file. I fixed my local copy by using the procedure described in issue #24

Database is not updating with the 2014 preseason

I installed the nfldb scripts a couple of months ago. Works great and thank you! It's been a real help for my FF analysis.

Recently I attempted to run them during the preseason games, and I'm still getting last years data. I attempted to manually update the meta table with the current preseason 2014 info and the scripts seem to overwrite that. I verified that the current ss.xml is returning current data, but the scripts seem to keep retrieving data from postseason 2013.

What should I do?

limit is broken

This code

import nfldb

db = nfldb.connect()

q = nfldb.Query(db)
q.game(season_year=2012, season_type='Regular')
q.player(full_name='Tom Brady')
q.play(passing_yds__ge=40)

for p in q.sort('passing_yds').limit(10).as_plays():
    print p

produces this output:

(NE, OWN 17, Q2, 1 and 10) (9:56) (Shotgun) T.Brady pass short left to S.Vereen for 83 yards, TOUCHDOWN.

But it should be:

(NE, OWN 17, Q2, 1 and 10) (9:56) (Shotgun) T.Brady pass short left to S.Vereen for 83 yards, TOUCHDOWN.
(NE, OWN 37, Q3, 3 and 10) (10:00) (Shotgun) T.Brady pass deep middle to D.Stallworth for 63 yards, TOUCHDOWN. NE 12-Brady 18th career game with 4+ TD passes, passing Johnny Unitas for 4th most all-time.
(NE, OWN 21, Q1, 1 and 10) (9:32) (No Huddle) T.Brady pass deep left to W.Welker to BAL 20 for 59 yards (B.Pollard).
(NE, OWN 44, Q2, 3 and 5) (3:17) (Shotgun) T.Brady pass deep left to J.Edelman for 56 yards, TOUCHDOWN.
(NE, OWN 18, Q4, 1 and 10) (9:18) (No Huddle, Shotgun) T.Brady pass deep right to B.Lloyd to SF 29 for 53 yards (C.Culliver).
(NE, OPP 46, Q1, 1 and 10) (5:59) (No Huddle, Shotgun) T.Brady pass deep left to W.Welker for 46 yards, TOUCHDOWN.
(NE, OWN 33, Q1, 2 and 5) (8:57) (Shotgun) T.Brady pass deep right to R.Gronkowski to BUF 26 for 41 yards (J.Rogers). Caught at BUF 32, slanting to sideline.
(NE, OPP 44, Q3, 2 and 1) (2:09) (No Huddle, Shotgun) T.Brady pass deep right to M.Hoomanawanui to SF 3 for 41 yards (D.Whitner).

How do I return a simple count of instances where the query is true?

For example how to return a count of the total number of 'plays' by a 'team' in a 'game' or 'drive'. In other words, count the number of items returned in the list.

I'm sure there is a simple solution but I have not been able to find one. Even though the nfldb objects are lists, the Python builtin functions I've tried like ['count'] have not worked.

Any help would be massively appreciated. Thank you.

Cannot cast drive.start_field and drive.end_field to integer or smallint

I'm looking to make some schema and data type tweaks to nfldb so that I can more easily serve nfldb data to Rails' Active Record models. I'm doing just fine with automatically casting user-defined data types to varchars, like player.position, player.status, game.day_of_week, and so on. But it seems that I'm not having much luck with the numeric user-defined data types for fields like drive.start_field, drive.end_field, and drive.pos_time.

Is there any way to do this? Here's what I've tried so far:

nfldb=> ALTER TABLE drive ALTER COLUMN start_field TYPE int;
ERROR:  column "start_field" cannot be cast automatically to type integer
HINT:  Specify a USING expression to perform the conversion.

So I tried specifying a USING expression, but here's the error:

nfldb=> ALTER TABLE drive ALTER COLUMN start_field TYPE int USING start_field::int;
ERROR:  cannot cast type field_pos to integer

It seems that I need to go one level deeper, not just setting the data type for the user-defined type field_pos, but also for the domain constraint field_offset.

The business reason for altering data types is that Rails Active Record Migrations can't recognize user-defined types in PostgreSQL, so won't know to interpret start_field and end_field as integers. I'm making a bunch of other table and column alters too, and have taken care to write changes to revert to the original schema so that nfldb-update still works.

Thanks in advance!

is it possible to connect without a password?

i setup nfldb with a postgres user that doesn't have a password but I get this when I try to connect:

psycopg2.OperationalError: fe_sendauth: no password supplied

is it possible to connect without a password?

Connection issue with nfldb

I’ve been around 18 months or so exclusively using nflgame, ignoring nfldb until yesterday. Using Windows XP, I successfully installed the DB using ochawkeys’s instructions. https://github.com/BurntSushi/nfldb/wiki/Windows-Install , imported the data, and installed nfldb using pip. However, ochawkeys’s final two steps 14 and 15 are failing due to a connection problem….I think?

C:\Python27\Scripts>python top-ten-qbs.py
Traceback (most recent call last):
  File "top-ten-qbs.py", line 3, in <module>
    db = nfldb.connect()
  File "c:\Python27\lib\site-packages\nfldb\db.py", line 104, in connect
    raise IOError("Could not find valid configuration file.")
IOError: Could not find valid configuration file.

C:\Python27\Scripts>python nfldb-update
STARTING NFLDB UPDATE AT 2014-07-19 07:26:46.316000
Connecting to nfldb... Traceback (most recent call last):
  File "nfldb-update", line 441, in <module>
    doit()
  File "nfldb-update", line 420, in doit
    db = nfldb.connect()
  File "c:\Python27\lib\site-packages\nfldb\db.py", line 104, in connect
    raise IOError("Could not find valid configuration file.")
IOError: Could not find valid configuration file.

Average Points Per Game

How would I be able to figure out the average points per game? I can't figure out how to add up all scores from a team's games for the current season. I tried to contact you on IRC for a couple days and still hadn't heard back.

a function to find score of game at a point in time

After filing issue #5, I think a somewhat appropriate response is to provide a function that takes a play in a game and returns the score of the game before the play given was snapped.

The score must be computed by looking at the scoring plays of all plays previous to the one given. This means that the score may be inaccurate in some circumstances. It could be inaccurate if the statistics are wrong/missing, or if the plays are out of their correct sequence.

Can't locate "config.ini" in order to Copy it

I cannot locate the "config.ini" file, which I am to copy in step number 23 of the Detailed Windows PostgreSQL installation. ( https://github.com/BurntSushi/nfldb/wiki/Detailed-Windows-PostgreSQL-installation )

When I look in C:\Python27\share I only see a \doc folder, there is no \nfldb directory. (This \doc folder contains \nflgame)

I'm a new user.
I've already installed Python27, pip, and then nflgame via pip.

I'm under the impression that I have not yet reached the stage where I install nfldb. Am I wrong?

I installed Postgres with no difficulties, and my command window output mimicked your screen-shots perfectly (with obvious exception of path strings).

I'm hoping (aka assuming) the nflgame install didn't create an unforeseen issue here, especially since it would appear that \doc would be a sibling of \nfldb, and I am under the impression that nfldb will install nflgame.

I'm perfectly willing to uninstall everything I've mentioned and start anew. What should I do?

Thanks in advance!!!

Beginner question

Hey,

I am something of a beginner at programming, but I think I managed to follow every step in terms of the installation of the postgres and SQL stuff. I just cant seem to get the nfldb to run in python, I keep getting this error: ImportError: No module named nfldb.

I managed to get the nflgame data to work, but I cant seem to get this done! Like I said, I'm not super experienced so I am very likely missing something very obvious. Anything that you think I might be missing, please let me know!

Thanks

Query Questions

Without dropping to the SQL layer, is there a way to obtain aggregate team statistics? For instance, I'd like to know the home team's rushing yards in a given game.

Additionally, I'd be curious to know the stadium a given game is played in. Ultimately, I'd like to derive a field which determines whether a given game is truly a home game for the home team.

Thanks in advance.

basic returns.

Thanks again for y'alls help with all of this the last couple of days. I am new to most of this and I appreciate the guidance.

I've read through the wiki page and attempted to cipher what I can from the API doc. Yet I am having trouble determining how to query the data I want.

I will be using this for fantasy football. Mostly I will use it after the weeks are completed but if I can pull it off I could find ways to use it in during games as well.

Regardless can someone detail out for me in layman's terms how to do query NFLDB to give me results like this:

Player: Tom Brady
Completions: 15
Yards: 160
TDs: 1
INT: 2

I don't care if I have query a specific player over and over again to get each stat. But how can I tell it "Hey, tell me how many TD's Tom Brady threw this week" or "How many times was he sacked" etc.

I feel like I am close to figuring this out, but I just can't break through. Maybe because it is 4:45am..

Could not find gsis_id message during nfldb-update

After nfldb-update, I got a bunch of these messages:

Could not find gsis_id for {'status': u'ACT', 'last_name': u'Thomas', 'weight':
325, 'profile_id': 2550532, 'number': 0, 'height': 75, 'college': u'Arkansas', '
full_name': u'Robert Thomas', 'first_name': u'Robert', 'years_pro': 0, 'birthdat
e': u'2/18/1991', 'team': 'WAS', 'position': u'NT', 'profile_url': u'http://www.
nfl.com/player/robertthomas/2550532/profile'}

Is this a real problem to worry about? I seem to remember reading that it is not.

NFLDB-Update not updating rosters

>>> C:\Python27\Scripts>python nfldb-update
-------------------------------------------------------------------------------
STARTING NFLDB UPDATE AT 2014-08-16 09:25:50.151000
Connecting to nfldb... done.
Setting timezone to UTC... done.
Updating player JSON database... (last update was 2014-08-13 00:07:18.078000+00:
00)
Loading games for PRE 2014 week 2
Downloading team rosters...
1/32 complete. (3.12%)Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\lib\site-packages\nflgame\update_players.py", line 479, in <
module>
    run()
  File "C:\Python27\lib\site-packages\nflgame\update_players.py", line 403, in r
un
    roster.append(meta_from_soup_row(team, row))
  File "C:\Python27\lib\site-packages\nflgame\update_players.py", line 161, in m
eta_from_soup_row
    profile_url = 'http://www.nfl.com%s' % tds[1].a['href']
IndexError: list index out of range
`C:\Python27\python.exe -m nflgame.update_players --no-block` failed (exit statu
s 1)
done.
Locking player table...
Updating 5946 players... done.
Locking write access to tables... done.
Updating season phase, year and week... done.
Bulk inserting data for 5 games...
        Sending batch of data to database.

For some reason, the rosters aren't updating (it says 1/32 complete) and there are some other errors. I'm not sure what is causing this.

nfldb-update error

I have nfldb installed and can run top-ten-qbs.py successfully. When I try to run the nfldb-update I get,

ImportError: No module named httplib2

Any help to fix this would be greatly appreciated

python 3 support

Python 3 support is intricately linked with Python 3 support in nflgame. Whatever decision is made for nflgame will carry to nfldb.

See BurntSushi/nflgame#78

VLC playback of nflvid-watch script generated playlists is blank

As shown in the screenshots on crunchbang, a debain flavor of linux, VLC doesn;t seem to like playing the video playlists generated with nflvid-watch.

All files can be viewed normally when selected via GUI and commandline in vlc.

Note, a potentially related issue occurs on Mac OS as well, but a 127 error is thrown.

All tests show that nfldb is working well, nflgame and the ability to download and split footage works as expected.

Great stuff!

Using nfldb / nflgame with Google AppEngine (et al)

After playing around with nfldb and nflgame in the Command Window, I tried to import nfldb into a HelloWorld.py project created in Google AppEngine for my local machine and was unable to use it because I couldn't get the app to find the third party psycopg2 module needed for Postgresql.

"No module named psycopg2._psycopg"

As I mentioned above, both modules work fine when I run in IDLE or Windows Command.

With AppEngine, I used the following successfully with nflgame, but it fails with nfldb due to the psycopg2 dependency:

import sys
sys.path.insert(0, 'C:\Python27\Lib\site-packages')
import nflgame

I'm a total rookie with nfldb and nflgame - Postgres, psycopg2, Python and AppEngine for that matter, but because the psycopg2 directory is a sibling of both nfldb and nflgame I can't think of a reason why it isn't found the way nflgame, and evidently nfldb are found.

Is there a different/better way to approach and accomplish this?

Thank you.

total_seconds and Python 2.6 incompatibility

If the code is written to implement total_seconds natively, (td.microseconds + (td.seconds + td.days * 24 * 3600) * 106) / 106, nfldb can support Python 2.6

What happens if there are two ore more overtime periods (which could happen in the playoffs)?

From curien on reddit:

I noticed that your schema the game table uses separate explicit rows for scoring each quarter. What happens if there are two ore more overtime periods (which could happen in the playoffs)? Come to think of it, I wonder how the GSIS data handles it (since, IIRC, every game has a "fifth" quarter score regardless of whether or not it goes to overtime).

If you had a separate table that stored (gsis_id, quarter, score) tuples, that wouldn't be an issue. Plus, it would more easily allow queries that treat quarters equally. E.g., it would be a lot easier to write a query that figured out which quarter had the most scoring. (I realize that scoring data is replicated in the play data, but I found that some extra points at least are missing, so it's not completely reliable.)

Aggregated Data Mismatch?

I'm trying to get the total passing yards and rushing yards of teams using the agg_play table and manual SQL. Noticing that my summation isn't giving the answers that match the NFL.com DB.

SELECT sum(a.passing_tds) as TD, sum(a.passing_cmp) as PassComp, sum(a.passing_yds) as 
PassYds, sum(a.rushing_yds) as RushYds from agg_play a, drive g, game m where a.drive_id = 
g.drive_id and g.gsis_id = a.gsis_id and g.pos_team = 'NO' and m.gsis_id = g.gsis_id and 
m.season_type = 'Regular' and m.season_year = 2013;

I get the following:

td passcomp passyds rushyds
39 444 5139 1476

NFL.com though has the following
passcomp: 446
passing yds: 4918
rushing yds: 1473

Noticing this with other teams too. I remember reading in the docs that it's likely that some stats may or may not match if they get adjusted by the NFL after the fact, just wondering if that's the case here.

Replacing NFLGame with NFLDB

Hi all,

I have a Windows PC and I originally installed NFLGame on it, which I installed without using pip. I then tried installing NFLDB on the same PC, but it has created some issues.
See #33 (Comment 84)

I tried to run the update module, and it looks like it worked somewhat, but I got some errors too.

I get an error saying
"no module named httplib2"
and a separate error saying
python.exe -m nflgame.update_players --no-block' failed (exit status 1)

otherwise, it seemed to load okay.

Is there a way to "uninstall" or delete the NFLGame and replace it with nfldb, or would simply reinstalling NFLDB with pip do the trick?

Otherwise, would it make sense to delete everything (Python 2.7, PostgreSQL, and all references to NFLGame/NFLDB) and start from the beginning?

Setting up postgresql on Ubuntu-ish systems

Thank you very much for the excellent work. I've played with nflgame some and am looking forward to getting nfldb going.

I am a decent programmer but far from a Linux pro, using Ubuntu. Ubuntu doesn't use systemd (though it looks like it will in future releases), and from reading about it I am not comfortable with trying to install it on my system.

Has anyone out there gotten the database going on a Debian-based system? If so can they write up a quick how-to? Or perhaps a couple translations for the the commands in the wiki guide that are specific to systemd? Any advice is much appreciated.

Postseason fix

First things first... I REALLY appreciate you putting this together! I'm a stats geek and love not having to maintain code to screen-scrape data from multiple websites to get less data than you're providing. You're awesome!

I had a working nfldb installation and all ninjas were happy. Downloaded the 1.3 files, copied them into C:\Python27\Lib\site-packages\nfldb and C:\Python27\Scripts (over-wrote old files, no other files removed or changed), then ran the nfldb-update with the following results:

/* BEGIN OUTPUT */

python ./Scripts/nfldb-update

STARTING NFLDB UPDATE AT 2014-02-05 11:17:28.334000
Connecting to nfldb... done.
Setting timezone to UTC... done.
Locking write access to tables... done.
Updating season phase, year and week... FAILED!
done.
Updating schedule JSON database...
C:\Python27\python.exe: No module named nflgame.update_sched
C:\Python27\python.exe -m nflgame.update_sched failed (exit status 1)
done.
Updating schedule for (Regular, 2013, 17)
Traceback (most recent call last):
File "./Scripts/nfldb-update", line 440, in
doit()
File "./Scripts/nfldb-update", line 430, in doit
update(db, player_interval=args.player_interval)
File "./Scripts/nfldb-update", line 350, in update
update_current_week_schedule(db)
File "./Scripts/nfldb-update", line 282, in update_current_week_schedule
for gsis_id, info in nflgame.sched.games.iteritems():
AttributeError: 'module' object has no attribute 'sched'
/* END OUTPUT */

Did I fail to do something correctly or did I find a bug?

Defense doesn't get credit for certain safeties

You have done a great job in producing a very useful database
and I commend you for sharing both the database and the technology
required in it's development.

In the 2013 wk1 game between TB and NYJ Josh Freeman fumbled the
ball out of the end zone for a safety.
gsis_id=2013090807
drive_id=3
play_id=522

drive.result="Fumble, Safety"
play_player shows one record for Josh Freeman and no defensive player record
since there was no fumble recovery or other defensive stat.

Short of searching all drive results and creating a new local column, do you
have a better way to generate this data?
(In other words have I missed something easy?)

A search for "Fumble, Safety" in the drive table shows 10 hits.
I wounder if this might also happen on a offensive penalty in the end zone?

I believe that the defence should be credited with a safety
even though no individual defensive player got credit.
One way might be to assign a player_id to a team.
For example "99-TB" (or "99-TBoff"-"99-TBdf/st") padded with blanks if necessary.
You could now have a team event in the play_player file
to hold data not belonging to an individual.

For my own use, I would like to see more team aggregate columns.
Locally I have created another table with columns I need. This isn't
too much of a problem for me since I don't use Python to access the data
and if I can derive the data I need from existing columns.

Thanks for your help.

some active player marked w/ UNK team and position

Hi,

I restored the nfldb and ran nfldb-update today. I got a bunch of errors for guys who were drafted this year (e.g. Bridgewater, etc), but that's no big deal.

Some active (for 2013) players' teams/positions are marked as UNK in the player table.
For example, Michael Bennett from SEA or Tony Gonzalez from ATL.
Re-running nfldb-update doesn't seem to change their team/positions.

Rookie issue

I was trying to do the live update via the nfldb-update command. But I realized as soon as Sammy Watkins was targeted, then there was an issue, as he didn't have an ID (i believe this is similar to the issue with players not having matching GSIS ID's when the first update is ran).

Is there something to do about this, or is this something that the NFL site needs to update first?

Thanks again. I've been having fun with this so far.

Install problems with pytz versions

Hi. Very new to python, so this could be an obvious fix (apologies if so). I get an install error (using command "pip install --user nfldb") that appears to be an issue with nfldb not using the most current version of pytz. Here is the pip output:

Downloading/unpacking nfldb
Running setup.py egg_info for package nfldb

Requirement already satisfied (use --upgrade to upgrade): nflgame in ./.local/lib/python2.7/site-packages (from nfldb)
Downloading/unpacking psycopg2 (from nfldb)
Running setup.py egg_info for package psycopg2

Downloading/unpacking enum34 (from nfldb)
Running setup.py egg_info for package enum34

Downloading/unpacking pytz==2013b (from nfldb)
Running setup.py egg_info for package pytz

warning: no files found matching '*.pot' under directory 'pytz'
warning: no previously-included files found matching 'test_zdump.py'

Source in ./build/pytz has the version 2013.8, which does not match the requirement pytz==2013b (from nfldb)
Source in ./build/pytz has version 2013.8 that conflicts with pytz==2013b (from nfldb)

Cannot execute nfldb-update successfully

I was having a problem with the 2014 preseason not getting updated when executing nfldb-update. So, I used "pip install --upgrade nflgame", and re-ran nfldb-update. Now I get this issue:

C:\Python27\Scripts>python nfldb-update
-------------------------------------------------------------------------------
STARTING NFLDB UPDATE AT 2014-08-21 17:01:03.612000
Connecting to nfldb... done.
Setting timezone to UTC... done.
Locking write access to tables... done.
Updating season phase, year and week... done.
Updating schedule JSON database...
Last updated: 2014-08-21 21:01:03.300000
done.
Updating schedule for (Preseason, 2014, 3)
done.
Adding schedule data for 309 games... Traceback (most recent call last):
  File "nfldb-update", line 441, in <module>
    doit()
  File "nfldb-update", line 431, in doit
    update(db, player_interval=args.player_interval)
  File "nfldb-update", line 360, in update
    nfldb.db._big_insert(cursor, 'game', insert)
  File "C:\Python27\lib\site-packages\nfldb\db.py", line 340, in _big_insert
    % (table, insert_fields, values))
  File "C:\Python27\lib\site-packages\psycopg2\extras.py", line 223, in execute
    return super(RealDictCursor, self).execute(query, vars)
psycopg2.IntegrityError: new row for relation "game" violates check constraint "
game_week_check"
DETAIL:  Failing row contains (2009080950, 54723, 2009-08-10 00:00:00+00, 0, Sun
day, 2009, Preseason, t, TEN, 0, 0, 0, 0, 0, 0, 0, BUF, 0, 0, 0, 0, 0, 0, 0, 201
4-08-21 21:01:03.644+00, 2014-08-21 21:01:03.644+00).

Is there another update I have perhaps overlooked? Or data I need to purge? Thanks.

Add support for having nfldb in non-public schema

nfldb should be able to reside in it's own database schema.

This should include all domains, types, tables etc

How to return play_count either by drive or game, minus penalties

Hi,

I've been working with the nfldb mostly by querying the pg database using pandas and then performing operations on the dataframe that way but am open to any method that allows the user to aggregate the play count on a drive (week) basis without penalties included in the play_count.

The best I've been able to come up with in this regard by querying either the, drive + game tables and aggregating the plays but this includes penalties without an obvious method to filter.

The other method I've tried combines the drive + play tables which allows for more obvious filtering for penalties ( ex. df_adj = df[df['note'] != 'PENALTY'] ). However, in this situation I find large duplications in terms of the play id etc ( I'm guessing due to the various players involved in the play? ) and am stumped on how to remove this duplication.

The same duplication situation applies when I attempt to join all 3 tables to filter by season_year, season_type, play_count, and no penalties. Any help would be appreciated. If you would like to see any sample code please let me know and I'll be happy to post it. Thanks for your hard work!

newb having some trouble

First, thank you for putting this together. I am excited about it.

However, as stated I am a newb and I've found myself lost.

I made it through the install instructions for Windows however each time a try to run either the top-ten-qbs.py or nfldb-update I get an error that says:
"ImportError: No module named pytz"

Not sure what I have done wrong or missed. Could you help me troubleshoot.

Thank you!
-John

Query player stats against opponent error

I finally got nfldb to work so I've been having fun learning the past month. But I have come across something I am not sure how to fix or why it's going wrong. Take the following code that gets all the stats for Jay Cutler vs Green Bay since 2009.

import nfldb

def stats_against(db, player_name, team):
    q = nfldb.Query(db)
    q.game(season_type='Regular', season_year=[2009, 2010, 2011, 2012, 2013], team=team)
    q.player(full_name=player_name)
    q.play(passing_att__ge=1, team__ne=team)
    return q.as_aggregate()[0]

db = nfldb.connect()
p = stats_against(db, 'Jay Cutler','GB')
print '%s, %d completions for %d yards, %d tds and %d ints' \
      % (p.player, p.passing_cmp, p.passing_yds, p.passing_tds, p.passing_int)

easy enough. But if you take a look at some of his stats, he also has some rushing yds, fumbles, sacks.. that I would like to account for as well. If I try to get those stats,

 print p.passing_sk, p.rushing_att, p.rushing_yds, p.fumbles_lost

output is:

0 0 0 0

I know this isn't accurate. It should be more like:
29 sacks 14 rushing att 92 rushing yds 2 fumbles.

Is there some query method where I can get every possible stat from the given player in one query, rather then having to make a new query for each statistical category. If I change

q.play(passing_att__ge=1, team__ne=team)

in the above code to

 q.play(rushing_att__ge=1,team__ne=team)

I get the rushing stats, still no sacks, and now Im without passing yards as well. I could combine another query to get sacks but I feel like there has to be a better way??

Any suggestions?
Thanks,
Roger

investigate docker

I've been playing around with docker at work, and if it can be made easy to run images on Windows, it might be a great idea to use it to distribute nfldb. The image would include PostgreSQL and the database itself. Possibly even the Python module.

This would be great for users that just want a database up and running that they can query.

installation instructions

I would have found it helpful if the installation instructions said to install a PostgresQL server on my box. I could have avoided this message:


    Error: You need to install postgresql-server-dev-X.Y for building a server-side extension or libpq-dev for building a client-side application.

DB not updating because "last updated" in future?

I happened to run nfldb-update during the first quarter last night on one of my systems. Now, regardless of how often I re-run nfldb-update on that system the statistics generated between the time I first ran it and the end of the game are not included in the database.

For comparison, I ran nfldb-update on a different system that did not pull a portion of the game last night. It successfully brings in all plays from the entire game.

I suspect this might be because nfldb-update thinks that the database was last updated in the future and therefore has no need to try to repull?

P:\Projects\Home Computer\Fantasy Football\2014>python c:\Python27\Scripts\nfldb-update
-------------------------------------------------------------------------------
STARTING NFLDB UPDATE AT 2014-08-04 08:43:50.956000
Connecting to nfldb... done.
Setting timezone to UTC... done.
Locking write access to tables... done.
Updating season phase, year and week... done.
Updating schedule JSON database...
Last updated: 2014-08-04 13:20:27.211000
done.
Updating schedule for (Preseason, 2014, 0)
done.
Closing database connection... done.
FINISHED NFLDB UPDATE AT 2014-08-04 08:43:51.843000
-------------------------------------------------------------------------------

Issue with Hall-of-Fame Game

Trying to load the rest of the 2014 schedule into the DB but throwing an error:
return super(RealDictCursor, self).execute(query, vars)
psycopg2.IntegrityError: new row for relation "game" violates check constraint "game_week_check"
DETAIL: Failing row contains (2014080300, 56426, 2014-08-04 00:00:00+00, 0, Sunday, 2014, Preseason, f, BUF, 0, 0, 0, 0, 0, 0, 0, NYG, 0, 0, 0, 0, 0, 0, 0, 2014-07-13 02:16:40.294716+00, 2014-07-13 02:16:40.294716+00).

sparse data --- in search of a good representation

In designing the schema for nfldb, I've run into an interesting problem: sparse data. I'll talk briefly about what I've looked into and what I've decided on for now. My goal is to lay out my decision procedure so that it can be improved upon by future willing participants that are smarter than me.

Each play has a set of statistics associated with it. Some of them are meta data about the play itself. For example, whether it was a third down attempt (and if it succeeded) or any penalty yards that accrue on the play. The other statistics are related to a particular player. For example, passing yards or a tackle.

There is only one set of meta statistics about a play, but there is usually more than one set of player statistics for a play.

The problem is that the set of possible statistics is quite large (a little over 100), but any particular combination of them in a play is quite small. Moreover, each statistical category must be searchable using the standard set of comparison operators: =, !=, <, <=, > and >=. To my knowledge, there are no fewer than four different approaches to storing this kind of data in PostgreSQL.

Use an entity-attribute-value ("EAV") table

An EAV table is a "skinny" table with only a few columns. One is a foreign key (which would link it to a play in this case), another is an attribute name and the last is a value. In this case, the value is always an integer except for sacks, which can have a decimal. Therefore, the value column would need type real (which wastes at least 2 bytes for each of the vast majority of rows, i.e., smallint versus real.)

An index on this table would be simple.

Querying this table would be extremely difficult. Every query would require some kind of pivot of rows into columns, which can be exceedingly complex to write. Since the main focus of nfldb is to provide a simple API that will automatically write queries for you, this would greatly increase the complexity of query generation.

For these reasons, EAV is typically considered an anti-pattern. I would be willing to accept the overhead of using real for each statistic, but the complexity of writing queries is a showstopper for me here.

Have a single table with many columns

In this case, most rows will have NULL for the majority of columns. In my reading, I've learned that PostgreSQL stores NULL values efficiently: each NULL takes up a single bit in a bitmap for the row. I would consider that acceptable overhead.

The problem here is having a table with a huge number of columns and the overhead that entails. In particular, I would very much like to have an index on each of the columns for efficient searching. But how does that scale to ~100 columns?

Use the hstore extension

This looked really promising at first. But it stores data as text. This would require casts every single time a query used a comparison operator on a particular statistic. (I don't know what the overhead of that is, but I'm hard pressed to believe a "cast" from text to numeric is free.) Moreover, while indexes can be defined on an hstore column, I don't think they can be (easily) defined on a particular key in an hstore column. That's two showstoppers IMO. If I'm wrong about my understanding of hstore, I'd be happy to be corrected.

Use an array

I think this is a strictly superior solution to hstore since it can store an array of real as opposed to text. But this suffers from using extra space just like the EAV solution does, and I don't think indexes can be created on a particular element of an array. Moreover, arrays are indexed numerically, which implies we'd need to use a mapping between statistic and its numeric index. That increases the complexity of query generation too.

In light of the above analysis, I've decided to just go ahead with creating a sparse table. The trade offs seem more attractive to me, particularly given that the space overhead will be fairly low and the query generation will remain simple. A possible alternative is to divide statistics into common groupings and use a table for each grouping (for example, passing, rushing, receiving, defense, etc.) I think that's a very viable option going forward, but I consider it to be a premature optimization at this point. It may help a lot if it turns out that indexes on a 100 column table suck.

LeSean McCoy Yards in 2013 200 short

Love the DB. Spotted an issue:

If I import the DB and run this query:

SELECT player.full_name, SUM(play_player.rushing_yds) AS rushing_yds
FROM play_player
LEFT JOIN player ON player.player_id = play_player.player_id
LEFT JOIN game ON game.gsis_id = play_player.gsis_id
WHERE game.season_year = 2013 AND game.season_type = 'Regular' AND player.position = 'RB'
GROUP BY player.full_name
ORDER BY rushing_yds DESC;

Rushing yards look accurate vs ESPN/NFL records:
http://espn.go.com/nfl/statistics/player/_/stat/rushing/sort/rushingYards

except for LeSean McCoy who is off by 200 yards.

-----------------------+-------------
 LeSean McCoy          |        1407
 Matt Forte            |        1320
 Jamaal Charles        |        1288
 Alfred Morris         |        1275
 Adrian Peterson       |        1266
 Marshawn Lynch        |        1257
 Ryan Mathews          |        1255
 Eddie Lacy            |        1162
 Frank Gore            |        1128
 DeMarco Murray        |        1095
 Chris Johnson         |        1077
 Knowshon Moreno       |        1039
 Zac Stacy             |         972
 C.J. Spiller          |         927
 Fred Jackson          |         896
 Reggie Bush           |         889
 Le'Veon Bell          |         857
 DeAngelo Williams     |         833
 Chris Ivory           |         833
 Maurice Jones-Drew    |         803
 Ben Tate              |         771
 LeGarrette Blount     |         768
 Stevan Ridley         |         767
 BenJarvus Green-Ellis |         756
 Rashad Jennings       |         733
 Lamar Miller          |         709
 Bilal Powell          |         697
 Gio Bernard           |         695
 Ray Rice              |         660
 Andre Ellington       |         652
 Trent Richardson      |         563
 Montee Ball           |         558
 Joique Bell           |         556
 Pierre Thomas         |         551
 Arian Foster          |         542
 Steven Jackson        |         542
 Donald Brown          |         537
 Andre Brown           |         492
 James Starks          |         491
 Doug Martin           |         456
 Bernard Pierce        |         436
 Danny Woodhead        |         425
 Mark Ingram           |         386
 Darren McFadden       |         379
 Kendall Hunter        |         358