Comments (24)
play_players
have a profile_id
associated with them that you could piece together the URL.
import nfldb
url_start = 'http://www.nfl.com/players/profile?id='
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2014, season_type='Preseason', week=0)
for pp in q.limit(20).as_aggregate():
print('%s: %s%s' % (pp.player, url_start, pp.player.profile_id))
Brian Moorman (BUF, P): http://www.nfl.com/players/profile?id=2502195
Josh Brown (NYG, K): http://www.nfl.com/players/profile?id=2505459
Eli Manning (NYG, QB): http://www.nfl.com/players/profile?id=2505996
Antrel Rolle (NYG, SS): http://www.nfl.com/players/profile?id=2506347
Mario Williams (BUF, DE): http://www.nfl.com/players/profile?id=2495982
Daniel Fells (NYG, TE): http://www.nfl.com/players/profile?id=2506619
Steve Weatherford (NYG, P): http://www.nfl.com/players/profile?id=2506821
Fred Jackson (BUF, RB): http://www.nfl.com/players/profile?id=2506871
Manny Lawson (BUF, DE): http://www.nfl.com/players/profile?id=2495885
Mathias Kiwanuka (NYG, DE): http://www.nfl.com/players/profile?id=2495879
Kyle Williams (BUF, DT): http://www.nfl.com/players/profile?id=2506931
Dan Carpenter (BUF, K): http://www.nfl.com/players/profile?id=2507401
Keith Rivers (BUF, OLB): http://www.nfl.com/players/profile?id=302
Dominique Rodgers-Cromartie (NYG, CB): http://www.nfl.com/players/profile?id=306
Mario Manningham (NYG, WR): http://www.nfl.com/players/profile?id=1030
Quintin Demps (NYG, FS): http://www.nfl.com/players/profile?id=1974
Zack Bowman (NYG, CB): http://www.nfl.com/players/profile?id=2507484
Kellen Davis (NYG, TE): http://www.nfl.com/players/profile?id=2507486
Landon Cohen (BUF, DE): http://www.nfl.com/players/profile?id=4499
Peyton Hillis (NYG, RB): http://www.nfl.com/players/profile?id=1980
I do not know how they decide the folder structure of the actual photos, though, so I think you'd have to scrape those yourself.
from nfldb.
Yeah, it's interesting, the profileIDs are different than the photo ID they are attached to. I would actually be able to bring it over directly if the ID in the image was the same. As it stands, I can bring over the page, but I have no way of tying the image.
from nfldb.
I agree this would be a nice addition. It's an easy but tedious change that requires
- Modifying
nflgame.update_players
to scrape the image URL. - Regenerating the full player database from scratch.
- Updating the
nflgame.Player
class to add a new instance variable. - Updating
nfldb
to support the newnflgame
field. (Includes adding it tonfldb.Player
class and adding a new database column innfldb/db.py
.)
You're welcome to take a crack at it, otherwise you have two choices:
- Use the profile URL to scrape the images yourself.
- Wait for me to implement it. (It isn't on my path of things I want to do so I don't know when I'll do it.)
I should caution you: unlike statistical data, images are copyrighted content (as are logos and video footage). Therefore, it isn't a good idea to show them on a public web site. If it's just for your personal private use, then you're OK.
from nfldb.
Thanks, I may give it a shot. One question though, is this something I would have to pull from the url, or would this also exist somewhere in the GSIS data?
from nfldb.
I don't know what the "GSIS data" is. Do you mean the NFL.com gamecenter feed? There is virtually no player data in the JSON feed other than an abbreviated name (e.g., T.Brady
) and a player gsis id. All other player meta data is scraped. The image URL would also have to be scraped.
from nfldb.
@iliketowel I would very much encourage you to take a crack it. I will happily mentor you through it. The easiest way is to log on to IRC/FreeNode at #nflgame
and mention my nick burntsushi
. During the week, I'm usually on in the evening 5-8/9pm EST. The weekend is hit or miss (this one is bad, next is better).
Otherwise, we could do it over the issue tracker or via email.
from nfldb.
I'm going to give it a shot. I'll let you know when i run into issues. But it probably won't be until tomorrow at the earliest.
from nfldb.
The photo URL isn't actually as cryptic as I once believed. Included in the source of the player profile page, residing right along side the GSIS ID
is an ESB ID
. This is the ID that is used to generate the photo URL.
For Dominique Rodgers-Cromartie, the following can be found:
<!--
Player Info
ESB ID: ROD616216
GSIS ID: 00-0026156
-->
The photo URL looks like
import nfldb
db = nfldb.connect()
def photo_url(player, esb_id):
photo_url_start = 'http://static.nfl.com/static/content/public/static/img/getty/headshot/'
a, b, c = player.last_name[:3].upper()
return '%s%s/%s/%s/%s.jpg' % (photo_url_start, a, b, c, esb_id)
player, _ = nfldb.player_search(db, 'Dominique Rodgers-Cromartie')
print('%s: %s' % (player.full_name, photo_url(player, 'ROD616216')))
Testing this out sight unseen has been successful for a number of players, though your mileage may vary.
For example, Tom Brady's profile page pp.player.profile_url
= http://www.nfl.com/player/tombrady/2504211/profile
takes us to a page where we can see
<!--
Player Info
ESB ID: BRA371156
GSIS ID: 00-0019596
-->
Grabbing that ESB ID
and punching it into the above gives us:
Tom Brady: http://static.nfl.com/static/content/public/static/img/getty/headshot/B/R/A/BRA371156.jpg
which takes us right to the Tom Terrific's beautiful mug.
from nfldb.
It would probably be worthwhile to extract both pieces, in case there are some images that don't follow the pattern.
I've also seen the ESB id used in other places (I think the XML gamebook files).
from nfldb.
The ESB ID, is equivalent to the actual Profile_ID on the nfl.com profile pages, the link to the images with the ESBID is actually quite simple. It's the image is always
http://static.nfl.com/static/content/public/static/img/getty/headshot/(1st Letter of Last Name)/(2nd Letter of Last Name)/ (3rd Letter of Last Name)/ESBID.jpg
What I've been struggling with is how to either pull that ID from the data the way that you pull the rest of the information in the script, or how to add the ESBID into the script.
from nfldb.
@ochawkeye URK! Beware. Both of those functions issue a new request. You really don't want to do that. Ideally you'd issue one request and retrieve all information possible.
@iliketowel I will try to write something up that will guide you. In the mean time, forget about nflgame. Instead, pick a profile page that has an image, read the documentation for beautifulsoup4
and try to write a Python program that extracts the image URL from it. You should need to use nflgame
at all for this:
import bs4
import requests
html = requests.get('profile_url').read()
soup = bs4.soup(html)
# do stuff with soup (see beautifulsoup4 doco for examples)
(That won't work verbatim. I'm just sketching out pseudo code and I probably got the function names wrong.)
from nfldb.
redacted :)
Another example of knowing only enough to be dangerous!
from nfldb.
You should need to use nflgame at all for this:
I'm just confirming here, you mean "Should Not", right?
from nfldb.
Whoops, sorry, yes you're right. Start without nflgame
. Of course, we'll eventually get it back into nflgame
(and nfldb
) proper, but it will be simpler and less dangerous this way. :-)
(If you work on nflgame-update-players
directly, then it isn't hard to end up launching thousands of requests to NFL.com. This is OK when you intend to do it, but doing it a lot accidentally is probably not a good idea.)
from nfldb.
@ochawkeye URK! Beware. Both of those functions issue a new request. You really don't want to do that. Ideally you'd issue one request and retrieve all information possible.
I know I'm over my skis here, but my new function to collect both GSIS ID
and ESB ID
.
def gsis_and_esb_ids(profile_url):
resp, content = new_http().request(profile_url, 'GET')
if resp['status'] != '200':
return None, None
gid, esb = None, None
m = re.search('GSIS\s+ID:\s+([0-9-]+)', content)
n = re.search('ESB\s+ID:\s+([A-Z][A-Z][A-Z][0-9]+)', content)
if m is not None:
gid = m.group(1).strip()
if n is not None:
esb = n.group(1).strip()
if len(gid) != 10:
gid = None
if len(esb) != 9:
esb = None
return gid, esb
def run():
...
if len(purls) > 0:
eprint('Fetching GSIS and ESB identifiers for players not in nflgame...')
def fetch(purl):
gid, esb = gsis_and_esb_ids(purl)
return purl, gid, esb
for i, (purl, gid, esb) in enumerate(pool.imap(fetch, purls), 1):
progress(i, len(purls))
from nfldb.
That looks pretty reasonable, although I'd probably use a looser regex:
ESB\s+ID:\s+([A-Z0-9]+)
In my experience, NFL.com isn't always terribly consistent with their identifiers...
from nfldb.
import bs4
import requestshtml = requests.get('profile_url').read()
soup = bs4.soup(html)do stuff with soup (see beautifulsoup4 doco for examples)
I'm clearly doing something wrong. Because I get an error as soon as I try to do
import requests
Traceback (most recent call last):
File "<pyshell#4>", line 1, in
import requests
ImportError: No module named requests
I installed beautifulsoup4 when I installed nfldb, but is there some other sort of install I need to do separately?
from nfldb.
When there is an ImportError
, it means that Python cannot find a module with the name that you tried to import. Typically, this means you have not installed that module. (On occasion, it means your environment is misconfigured.)
In this scenario, it's likely that you simply haven't installed requests
. In the Python world, we use a tool called pip
to install and manage Python modules. pip
is by default configured to install packages from PyPI. You can search for packages there: https://pypi.python.org/pypi --- Try searching for requests
.
Each search result is a package you can install. The package name is what you can use to install it with pip
. This search is instructive because there are several related results and the first result, drequests
, is not the right one. Instead, you need to look at the description and see if it makes sense with respect to what you're trying to do. In this case, the description for drequests
says that it is a web application framework. Are we building a web app? Nope. Next. OK, now we see requests
and it says it is "Python HTTP for Humans." Not a terribly great description, but we are using it to download web pages, which works over the HTTP protocol. Plus, the package name requests
matches the module name we want to import, requests
. (This is not always true!!!!)
So once we think we know the package we want, it's time to install it, just like you installed nfldb
:
pip install requests
And then you should be able to run python -c "import requests"
successfully.
from nfldb.
So, I'm still on the first part. I got as far as this:
from bs4 import BeautifulSoup
import requests
import redef get_soup(url):
>> request = requests.get(url).content
>> return BeautifulSoup(request)url = "http://www.nfl.com/player/ejmanuel/2539228/profile"
soup = get_soup(url)
bimg = re.compile('.http://static.nfl.com/static/content/public/static/img/getty/headshot')
img_links = soup.find_all("img", {'src': bimg})
for link in img_links:
>> print link
Which prints the link:
img height="90" onerror="if (this.src != 'http://i.nflcdn.com/static/site/img/sr_pic0.gif') {this.src='http://i.nflcdn.com/static/site/img/sr_pic0.gif'}" src="http://static.nfl.com/static/content/public/static/img/getty/headshot/M/A/N/MAN738705.jpg" width="65"/>
But, I don't know how to pull out only the "MAN738705" (or 738705)?
from nfldb.
I think if you print link.src
, it will show you the URL. Then you can pull it out with a regex:
import re
s = "http://static.nfl.com/static/content/public/static/img/getty/headshot/M/A/N/MAN738705.jpg"
m = re.search('([^/]+)\.[^/]+$', s)
print m.group(1)
Output: MAN738705
.
from nfldb.
Okay, so, I'm not clear on the next step. I have this ability to create a static pull of the ID, but how do I do that dynamically for all of the players. I suspect it's something about the website, but I'm not sure what.
from nfldb.
@iliketowel That piece you thankfully don't need to worry about. The nflgame/update_players.py will actually do it for you.
The next step is to take the code you used to extract the ID from the HTML and merge it into nflgame/update_players.py
. My guess is that you'll want to modify the gsis_id function so that it gets more than the GSIS ID from the page. For example, here's the current code:
def gsis_id(profile_url):
resp, content = new_http().request(profile_url, 'GET')
if resp['status'] != '200':
return None
m = re.search('GSIS\s+ID:\s+([0-9-]+)', content)
if m is None:
return None
gid = m.group(1).strip()
if len(gid) != 10: # Can't be valid...
return None
return gid
Here's what you might want to do: (notice the name change of the function!)
def nfl_ids_for_player(profile_url):
resp, content = new_http().request(profile_url, 'GET')
if resp['status'] != '200':
return None
m = re.search('GSIS\s+ID:\s+([0-9-]+)', content)
if m is None:
return None
gid = m.group(1).strip()
if len(gid) != 10: # Can't be valid...
return None
# Your code goes here...
soup = ...
esb_id = ...
return {'gsis_id': gid, 'esb_id': esb_id}
So at this point, I started going deeper (because you have to change the places where gsis_id
is called to deal with the new return value), but I quickly realized that it is probably not a good use of your time. The update_players.py
script is grossly over complicated because it goes to dramatic lengths to keep the number of requests to NFL.com to a minimum. (During the season, running the script often results in no requests at all!)
If you could do the above and submit a pull request to the nflgame
repository (not nfldb
), then I think I'll be able to handle the rest. :-)
from nfldb.
@iliketowel @BurntSushi I'm not sure where this ended up, or if it went offline or whut, but I'm in the market for just this thing.
I know it's over 2 years old, but I'd be happy to help contribute where possible to get something working. For personal use, of course.
from nfldb.
FWIW I've been using this data just locally in a Postgres DB and found a pretty straight forward way to inject the ESB IDs using some modification of the above code and psycopg2. From that I can just apply a generic URL to have the avatars render wherever I query it. I'm not sure anyone's interested in my janky Python code but the above references were super helpful getting it working.
from nfldb.
Related Issues (20)
- Any active users here? HOT 36
- nfldb-update: command not found HOT 4
- New Season HOT 2
- pip freeze
- Has the NFL ever complained about this project? HOT 2
- pip error
- Game gsis_id = 2017111911 missing in schedule.json (Ochawkeye issue #298)
- Not able to install in Windows HOT 2
- pip installation error HOT 1
- ConfigParser with-update HOT 1
- GSIS ID 2019090808 (CIN @ SEA) stored incomplete JSON to database, unable to replace HOT 4
- Bulk Upload 2017-2019 nfldb-update HOT 4
- Installed and ran nfldb-update. Data Missing for 2017 and 2018 HOT 2
- Expected Points & Win Probability HOT 1
- KeyError Gsis_id = 2019122202
- NFL Update HOT 1
- Anyone have a .sql export with 2019? HOT 1
- XML feeds removed from NFL website? HOT 4
- NFL db update on Anaconda help HOT 1
- Odoo v14 error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nfldb.