billpetti / baseballr Goto Github PK
View Code? Open in Web Editor NEWA package written for R focused on baseball analysis. Currently in development.
Home Page: billpetti.github.io/baseballr
License: Other
A package written for R focused on baseball analysis. Currently in development.
Home Page: billpetti.github.io/baseballr
License: Other
There is a tryCatch
in the scraping function, but it isn't catching errors and warnings. If you try to loop over a sequence of days, this can cause it to fail in the middle of the loop and lose the work done up to that point.
example,
date_seq = seq(as.Date("2017-07-09"), as.Date("2017-07-14"), by=1)
statcast_list = lapply(date_seq, function(d) {scrape_statcast_savant_batter_all(start_date = as.character(d), end_date = as.character(d))})
[1] "These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved."
[1] "Grabbing data, this may take a minute..."
URL read and payload aquired successfully.
[1] "These data are from BaseballSevant and are property of MLB Advanced Media, L.P. All rights reserved."
[1] "Grabbing data, this may take a minute..."
URL caused a warning. Make sure your date range is correct:
Original warning message:
incomplete final line found by readTableHeader on 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC=&hfSea=2017%7C&hfSit=&player_type=batter&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=2017-07-10&game_date_lt=2017-07-10&team=&position=&hfRO=&home_road=&hfFlag=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details&'
Error in scrape_statcast_savant_batter_all(start_date = as.character(d), :
object 'payload' not found
statcast_list
Error: object 'statcast_list' not found
Documentation for start_date and end_date reads "Format must be in Y-d-m format." This should be changed to match the function which appears to be YYYY-MM-DD.
I'm trying to install baseballr on R 3.2.3 (both on Windows and Linux) and I'm getting the following error:
The downloaded source packages are in
‘/tmp/RtmpRFVgnk/downloaded_packages’
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore CMD INSTALL '/tmp/RtmpRFVgnk/devtools86a798f69fb/BillPetti-baseballr-7a96d6e'
--library='/home/martin/R/x86_64-pc-linux-gnu-library/3.2' --install-tests
Am I missing something or is the package not yet compatible with version 3.2.3?
Bill, not sure what happened to this edit before, I thought you added this before, but you're missing the PO and S for the playoffs and spring training in the GT portion of the URL. It's only the batter_all one, the other 3 have all 3 strings in the url.
Hello all,
I've written a function that generates a plot of monthly averages of release velocity of given a dataset and updated it so that it is applicable for the Statcast data in the baseballr package. I'd like to contribute developing the package by writing functions that visualize Statcast and PITCHf/x data.
This following code would generate a plot that shows how Justin Verlander's velocity changed over seasons.
library(ggvis)
library(xts)
library(baseballr)
#Using ggvis and xts packages, function will generate a plot of monthly average of release velocity of given a dataframe.
#Justin Verlander's 2013-2016 statcast data
verlander <- scrape_statcast_savant_pitcher(start_date = "2013-04-06", end_date = "2016-10-31",pitcherid =434378)
velo_monthly <- function(df,overplot=F ,fastball="both"){
#Fastball vs. Non-fastball
df$fastball <- as.factor(df$pitch_type %in% c("FA","FF","FC","FT","FS"))
levels(df$fastball) <- c("F", "NF")
if(fastball=="NF"){
ndf <- df %>%filter(fastball=="NF")
shapes <- "cross"
} else if(fastball=="F"){
ndf <- df %>% filter(fastball=="F")
shapes <- "circle"
}else {
ndf <- df
shapes <-"diamond"
}
#Time Series
idx <- ndf$game_date
df_ <- xts(ndf[,c("pitcher","game_date","inning","fastball","pitch_type","start_speed")],order.by=idx)
#Monthly avg of velocity
mthlySumm <- apply.monthly(df_[,6],mean,na.rm=T)
mthlysum <- as.data.frame(coredata(mthlySumm))
mthdat <- as.data.frame(mthlysum[,1])
names(mthdat) <- "velo_mon"
mthdat$period <- index(mthlySumm)
mthdat$seasonYear <- year(mthdat$period)
mthdat$month <- month(mthdat$period)
#overplot over seasonYear
if(overplot==F){
ans <-mthdat %>% ggvis(~period,~velo_mon) %>%layer_points(fill=~as.factor(seasonYear),shape:=shapes) %>% group_by(seasonYear) %>% layer_smooths(stroke=~as.factor(seasonYear)) %>%add_axis("x",title=paste(df$player_name[1],min(df$game_date),max(df$game_date)),subdivide = 2) %>% add_axis("y", title="Velocity Monthly Average",subdivide=4) %>%add_legend(c("fill","stroke"), title = "Season", orient = "right")
}else{
ans<-mthdat %>% ggvis(~month,~velo_mon) %>%layer_points(fill=~as.factor(seasonYear),shape:=shapes) %>% group_by(seasonYear) %>% layer_smooths(stroke=~as.factor(seasonYear))%>%add_axis("x",title=paste(df$player_name[1],min(df$game_date),max(df$game_date)),subdivide = 2) %>% add_axis("y", title="Velocity Monthly Average",subdivide=4) %>%add_legend(c("fill","stroke"), title = "Season", orient = "right")
}
return(ans)
}
#Justin Verlander's 2013-2016 statcast data
verlander <- scrape_statcast_savant_pitcher(start_date = "2013-04-06", end_date = "2016-10-31",pitcherid =434378)
velo_monthly(verlander)
velo_monthly(verlander,fastball="NF")
velo_monthly(verlander,overplot=TRUE,fastball="F)
I followed this great post in order to build my Statcast database. https://billpetti.github.io/2018-02-19-build-statcast-database-rstats/
The only additions were:
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = "statcast.sqlite3")
dbWriteTable(con, "statcast", statcast_bind)
I went to look at the Statcast data on Baseball Savant, and it looks like some of the pitch type data has changed. I haven't looked at every year yet, but it has definitely changed for 2017.
2 Questions for you.
What's the easiest way to rebuild the database? At a minimum, I'd want to replace the 2017 values in my database.
Can I bind 1 season at a time to the Database using the dbWriteTable function above, or would that function overwrite my existing "statcast" table in the database?
Thanks!
Idea for an enhancement: reverse functionality for playerid_lookup to return player name based on playerid.
Hey Bill,
I'm glad to know my PRs were merged into the master branch.
I've reinstalled the package and I'm able to run the standings_on_date_bref
without problems and get the new names on the table.
Nevertheless, the function viz_gb_on_period
is not available in the package. I do not know if that is related to the difference in case-sensitive names in the function.
Regards.
Daniel.
I know this is likely infeasible, but grabbing the score-state for each pitch (number of runs for home team, number of runs for the away team) in the scrape_statcast_savant_batter_all
would be awesome.
I am new to using R and am trying to work through this data and mess around and see what I can do, during installation of the package I will get random errors
Installation failed: NULL : 'rcmd_safe_env' is not an exported object from 'namespace:callr'
Any tips t clear this up are appreciated, thanks
standings_on_date_bref() fails with an error.
My code:
library(baseballr)
standings_on_date_bref("2015-08-04", "AL East")
Error in setNames(., table_names[ind]) :
'names' attribute [1] must be the same length as the vector [0]
Running baseballr version 0.3.2 on Windows 10 with R 3.3.3.
I get this error no matter what I do:
> fg <- fg_bat_leaders(2016, 2016, 0)
Error in .[[33]] : subscript out of bounds
> fg <- fg_bat_leaders(2015, 2016, 0)
Error in .[[33]] : subscript out of bounds
> fg <- fg_bat_leaders(2015, 2016, 4)
Error in .[[33]] : subscript out of bounds
Not sure what other info you'd find helpful. Let me know and I will provide it :-)
Hi there Good evening
Iam novice R user but always follow instructions as good as possible
Trying to install baseballr I encounter the following.
PLase help Thank you
> require(devtools)
install_github("BillPetti/baseballr")
Error in curl::curl_fetch_disk(url, x$path, handle = handle) :
Couldn't resolve host name
library("devtools", lib.loc="C:/Program Files/R/R-3.2.1/library")
> install_github("BillPetti/baseballr")
Downloading GitHub repo BillPetti/baseballr@master
from URL https://api.github.com/repos/BillPetti/baseballr/zipball/master
Installing baseballr
Installing 1 package: lubridate
There is a binary version available (and will be installed) but the
source version is later:
binary source
lubridate 1.6.0 1.7.1
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/lubridate_1.6.0.zip'
Content type 'application/zip' length 654624 bytes (639 KB)
downloaded 639 KB
package ‘lubridate’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: reldist
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/reldist_1.6-6.zip'
Content type 'application/zip' length 116008 bytes (113 KB)
downloaded 113 KB
package ‘reldist’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: rvest
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/rvest_0.3.2.zip'
Content type 'application/zip' length 853411 bytes (833 KB)
downloaded 833 KB
package ‘rvest’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: XML
There is a binary version available (and will be installed) but the
source version is later:
binary source
XML 3.98-1.6 3.98-1.9
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/XML_3.98-1.6.zip'
Content type 'application/zip' length 4298226 bytes (4.1 MB)
downloaded 4.1 MB
package ‘XML’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
Installing 1 package: xml2
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/xml2_1.1.1.zip'
Content type 'application/zip' length 3488697 bytes (3.3 MB)
downloaded 3.3 MB
package ‘xml2’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpCeAFw9\downloaded_packages
"C:/PROGRA1/R/R-321.1/bin/x64/R" --no-site-file --no-environ --no-save
--no-restore --quiet CMD INSTALL
"C:/Users/Javier/AppData/Local/Temp/RtmpCeAFw9/devtools430478a12f62/BillPetti-baseballr-c1f2ddf"
--library="C:/Program Files/R/R-3.2.1/library" --install-tests
I tried running the following code and received the following error:
library(baseballr)
viz_gb_on_period("2018-03-29", "2018-04-14", "NL Central")
Error in hcaes(x = Date, y = GB, group = Team) :
could not find function "hcaes"
Since that is from the highcharter
package I accessed its library directly with library(highcharter) and then it worked, so I'm assuming there's a dependency issue or missing highcharter::hcaes.
team_consistency(2017)
Error in team_results_bref(.$Tm, .$year) : object 'col_names' not found
I'm getting the error above when trying the team_consistency function
Hi,
Running the following code from scratch results on an error in hc_theme_smpl()
function from the highcharter
package.
library(baseballr)
viz_gb_on_period("2018-03-29", "2018-04-18", "AL East")
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 52s
# A tibble: 10 x 7
League Date Team W L WLpct GB
<chr> <date> <chr> <int> <int> <dbl> <dbl>
1 AL East 2018-03-29 NYY 1 0 1.00 0.
2 AL East 2018-03-29 TBR 1 0 1.00 0.
3 AL East 2018-03-29 BAL 1 0 1.00 0.
4 AL East 2018-03-29 BOS 0 1 0. 1.00
5 AL East 2018-03-29 TOR 0 1 0. 1.00
6 AL East 2018-04-18 BOS 15 2 0.882 0.
7 AL East 2018-04-18 TOR 12 5 0.706 3.00
8 AL East 2018-04-18 NYY 8 8 0.500 6.50
9 AL East 2018-04-18 TBR 5 13 0.278 10.5
10 AL East 2018-04-18 BAL 5 13 0.278 10.5
Error in hc_theme_smpl() : could not find function "hc_theme_smpl"
The problem is solved if we run library (highcharter)
, so it seems that the issue is related to importing function hc_theme_smpl
in the baseballr
package.
My R session is (after loading highcharter
):
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] highcharter_0.5.0 bindrcpp_0.2 baseballr_0.3.3
loaded via a namespace (and not attached):
[1] httr_1.3.1 tidyr_0.8.0 jsonlite_1.5
[4] splines_3.4.4 Formula_1.2-2 assertthat_0.2.0
[7] TTR_0.23-3 latticeExtra_0.6-28 selectr_0.3-2
[10] yaml_2.1.18 pillar_1.2.1 backports_1.1.2
[13] lattice_0.20-35 glue_1.2.0 rlist_0.4.6.1
[16] digest_0.6.15 RColorBrewer_1.1-2 checkmate_1.8.5
[19] rvest_0.3.2 colorspace_1.3-2 htmltools_0.3.6
[22] Matrix_1.2-12 plyr_1.8.4 psych_1.7.8
[25] XML_3.98-1.10 pkgconfig_2.0.1 broom_0.4.3
[28] purrr_0.2.4 scales_0.5.0 XML2R_0.0.6
[31] htmlTable_1.11.2 tibble_1.4.2 mgcv_1.8-23
[34] ggplot2_2.2.1 pbapply_1.3-4 nnet_7.3-12
[37] hexbin_1.27.2 lazyeval_0.2.1 cli_1.0.0
[40] quantmod_0.4-12 mnormt_1.5-5 crayon_1.3.4
[43] survival_2.41-3 magrittr_1.5 nlme_3.1-131.1
[46] MASS_7.3-49 xts_0.10-2 xml2_1.2.0
[49] foreign_0.8-69 reldist_1.6-6 tools_3.4.4
[52] data.table_1.10.4-3 stringr_1.3.0 munsell_0.4.3
[55] cluster_2.0.6 compiler_3.4.4 rlang_0.2.0
[58] grid_3.4.4 RCurl_1.95-4.10 rstudioapi_0.7
[61] pitchRx_1.8.2 htmlwidgets_1.0 igraph_1.2.1
[64] bitops_1.0-6 base64enc_0.1-3 gtable_0.2.0
[67] curl_3.1 reshape2_1.4.3 R6_2.2.2
[70] gridExtra_2.3 zoo_1.8-1 lubridate_1.7.3
[73] knitr_1.20 dplyr_0.7.4 utf8_1.1.3
[76] bindr_0.1.1 Hmisc_4.1-1 stringi_1.1.7
[79] parallel_3.4.4 Rcpp_0.12.16 rpart_4.1-13
[82] acepack_1.4.1 tidyselect_0.2.4
Downloading GitHub repo BillPetti/baseballr@master
from URL https://api.github.com/repos/BillPetti/baseballr/zipball/master
Error: Does not appear to be an R package (no DESCRIPTION)
The bulk of the package is installing for me, but the full package isn't. I'm getting error messages at this stage.
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.6.tgz'
Error in download.file(url, destfile, method, mode = "wb", ...) :
cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.6.tgz'
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.4/XML_3.98-1.6.tgz': HTTP status was '404 Not Found'
Warning in download.packages(x$name, destdir = dest_dir, repos = x$repos, :
download of package ‘XML’ failed
Error in download.packages(x$name, destdir = dest_dir, repos = x$repos, :
subscript out of bounds
Attempted to run the example code for team_results_bref and received the following error:
nyy <- team_results_bref('NYY', 2017)
Error: Column 20 must be named
standings_on_date_bref function appears to have run as expected.
R 3.4.2/RStudio 1.1.423/macOS 10.13
Let me know if there is any additional logging that would be beneficial.
The Mookie Betts example won't run on the most current version of ggplot. The culprit seems to be the plot.subtitle
argument from the imported theme, which was written with the developmental version of ggplot2.
I tripped on this while trying to write a vignette from the Mookie Betts example. This isn't a big issue, but (if you plan on submitting to CRAN), it wouldn't fly in its current condition. My recomendation would be to re-write the theme function to exclude plot.subtitle
and use **
in Rmarkdown to append a subtitle to the plot.
This isn't a huge deal, feel free to close the issue if you want. I just wanted you to be aware because as it sits right now, most users won't be able to run it.
library(baseballr)
team_results_bref("NYM",` 2015)
Error` in team_results_bref("NYM", 2015) : could not find function "%>%"
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] baseballr_0.0.0.9000
loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.2
It ran OK when I loaded dplyr and rvest independently
On a side note, you might want to set the output to a tbl_df for easier handling
Good luck with package
When I run this fg_bat_leaders code on the project website:
head(fg_bat_leaders(x = 2015, y = 2016, league = "all", qual = 1200, ind = 0)) %>% select(Seasons:AVG)
I get the following error:
Error in leaders[1, ] : incorrect number of dimensions
Is this a me problem?
The data frame returned does not include a game_id, so double headers are mixed together. Here's an example:
tmp <- scrape_statcast_savant_batter_all("2016-05-07", "2016-05-07")
tmp1 <- tmp %>% filter(home_team == "BAL", away_team == "OAK", inning == 1, inning_topbot == "Top")
Installed on April 14, 2016, and tried to run the first example in the README: standings_on_date_bref.
Copying and pasting the exact code in the example in, I received an error:
> standings_on_date_bref("2015-08-01", "NL East", from = FALSE)
Error in function_list[[i]](value) : could not find function "html_text"
This same error threw when testing other dates as well.
I'm using R 3.2.0 on a Mac running OS 10.10.5 Yosemite.
I was looking for games for Aaron Judge in July and August of 2017 and found the data are missing. Here is the code I am running.
judge.data.miss <- scrape_statcast_savant_batter(start_date = "2017-07-15", end_date = "2017-8-15", batterid = 621043)
unique(judge.data.miss$game_date)
You can see that only two games for Aaron Judge come up during this stretch. But there should be more, as he only missed three games combined in these months.
When I use scrape_statcast_savant_batter_all(start_date = "2018-03-28", end_date = "2018-04-17") or pitcher, all I get is 40000 rows of data exactly with a date range of 2018-04-07 to 2018-04-17, so it appears I only get the most recent 40000 data points.
I tried running this data pull:
head(fg_bat_leaders(x = 2015, y = 2016, league = "all", qual = "y", ind = 0)) %>%
select(Seasons:AVG)
I got the error below:
Error in select_(.data, .dots = lazyeval::lazy_dots(...)) :
object 'Seasons' not found
In addition: Warning messages:
1: NAs introduced by coercion
2: NAs introduced by coercion
This is the version I am currently running but when trying to install baseballr it says that it is unavailable for this version. Is this just a matter of time before it will be available or do I have to downgrade to a previous version?
Can you change GT=R%7C to GT=R%7CPO%7CS%7C so you can pull playoff and spring training data as well if those games fall within the date range? Thanks!
Hey @BillPetti I quietly dropped this package on CRAN today. Basically, just a method to download tables from the Baseball Databank, because I got tired of Lahman always being out of date. There's no real overlap with your package but there are a couple of functions I thought you would find useful. Feel free to use them.
Scraper for the Chadwick Bureau's Baseball Databank Git repo.
https://github.com/keberwein/baseballDBR/blob/master/R/get_bbdb.R
wOBA values based on the Lahman tables. I ported Tom Tango's SQL over to an R function.
https://github.com/keberwein/baseballDBR/blob/master/R/woba_values.R
FIP values based on a similar method.
https://github.com/keberwein/baseballDBR/blob/master/R/fip.R
I'll probably start promoting the package next week sometime. Anyhoo, feel free to close this issue, just wanted to open an invitation to use any of this stuff if you want.
Is there anyway that this can be included in the data scrape as a part of the data frame?
Hi!
Cool package, which I read about on Exploring Baseball Data with R!
I am having one issue: For me, the function scrape_statcast_savant_pitcher() is returning batting data for pitchers, not pitching data. I believe when building the URL the relevant part of the setting should be player_type=pitcher instead of the current player_type=batter, based on my read of the function here. When I manually make that change to the function, it returns pitching data for pitchers.
Attached is my working file in case it might help. Sorry if I overlooked something obvious.
baseballr.txt
Cheers,
Eric Tassone
I'm using this package for a data science class and would like to cite you. Could you add a citation file?
Thanks,
Matt
Would it be possible to create a new function that pulls minor league daily data for both hitters and pitchers from Fangraphs? Possibly with the ability to select from all leagues, or specify a particular level?
Thanks for creating this package, it's been a great help!
Looking at your stat line for stat cast. Terrific script. Shouldn't SwStr% include swinging_strikes_blocked and foul_tips? I did a quick data check on baseball savant and it appears that Whiffs include both of those in addition to swinging_strikes.
Ran the edge scrape script for any date in 2016 no problem, but I get this error when it's 2017 or 2018
Error in function (type, msg, asError = TRUE) :
Could not resolve host: writefunction
"Error in leaders[1, ] : incorrect number of dimensions"
So it seems like the structure of the leaders page has changed.
The following issues arise in team_results_bref:
I have tried using exports from Baseball-Reference's team page myself and have had trouble, so these issues are totally understandable! This package is awesome, thanks for helping the baseball community so much.
I've been trying to use the fg_bat_leaders()
command, but it looks like the new Fangraphs layout broke the html scraper. I get an Error in .[[24]] : subscript out of bounds
. When I tried copy and pasting the function guts to find the error I found the failure comes on the initial read_html()
call. I assume something in the paste0()
needs to be changed with the FG update, but I could be wrong!
Is anyone else getting this result?
I did some testing in the savant batter all scrape function and found there is a new parameter (hfC=)
Update the url from
to
and I think that should fix it, at least it did in my fork...
There seems to be a limit of 30,000 rows when using the scrape_statcast
functions. When I run:
start="2015-04-01"
stop="2015-05-01"
statcast_pitching=scrape_statcast_savant_pitcher_all(start,stop)
min(statcast_pitching$game_date)
The result is:
[1] "2015-04-24"
And I get a data set of 30,000 rows
If I change the stop date to an earlier date:
start="2015-04-01"
stop="2015-04-10"
statcast_pitching=scrape_statcast_savant_pitcher_all(start,stop)
min(statcast_pitching$game_date)
I get:
[1] "2015-04-05"
And a data set of ~17,500 rows.
And then if I try a very long window:
start="2015-04-01"
stop="2015-07-01"
statcast_pitching=scrape_statcast_savant_pitcher_all(start,stop)
min(statcast_pitching$game_date)
I get this nasty error message:
URL caused a warning. Make sure your date range is correct:
Original warning message:
incomplete final line found by readTableHeader on 'https://baseballsavant.mlb.com/statcast_search/csv?all=true&hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7CPO%7CS%7C&hfC=&hfSea=2015%7C&hfSit=&player_type=pitcher&hfOuts=&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=2015-04-01&game_date_lt=2015-07-10&team=&position=&hfRO=&home_road=&hfFlag=&metric_1=&hfInn=&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_abs=0&type=details&'
Error in scrape_statcast_savant_pitcher_all(start, "2015-07-10") :
object 'payload' not found
Is this an issue with the package itself or just Savant Search? And is there an easy work around other than scrapping smaller time frames and putting them together?
Hi Bill,
This is an enhancement proposal on which I'd like to work.
Colors of lines in the viz_gb_on_period
plots are assigned automatically by the highcharter
package.
The idea is to define static colors for each MLB team line, using the `teamcolors" R package from @beanumber.
Regards.
Daniel.
I keep getting this error message when trying to download therefore I cannot use the packagee
Hi,
I have been trying to install baseballr using R 3.3.2 on Windows, and I get the following error:
I have tried installing the missing package separately, but every time I do this and then try installing baseballr again I get the same error but with a different package listed as missing. Any idea how to fix this?
Thanks
Hi,
When I run this code for A.J. Pollock, it downloads his data, but also everyone else's in the date range. Three days ago it would only download the playerid specified data.
bat <- scrape_statcast_savant(start_date = "2018-03-28", end_date = paste(Sys.Date() - 1), playerid = 572041, player_type = 'batter')
scrape_statcast_savant_pitcher("2015-03-02","2017-11-25", "592789")
only scrapes for the year 2015
is the url that would pull 2015-2017 data
This seems to be the only difference i notice. hfSea=2017%7C2016%7C2015%7C
Thank you, Bill, great package!
It would be extremely useful to be able to scrape the season-to-date team standings, batter stats, and pitcher stats for any given date since the beginning of the corresponding season.
So, instead of daily_batter_bref("2015-05-10", "2015-06-20") yielding the batter stats averaged across the time period between the first date to the 2nd date provided, would it be possible to output the same stats, but from the beginning of the season to the specified date (e.g., "2015-05-10"), and the season-to-date stats for every subsequent day within the range of the first date and the second date (e.g., "2015-06-20") specified, with a separate row of data for each date?
Thanks again for a very useful package.
After installing baseballr thanks to your help, I then went to do your first example and it did not work. t asked for 'selectr' I installed it and then ran your sample
standings_on_date_bref("2015-08-01", "NL East", from = FALSE)
Error in loadNamespace(name) : there is no package called ‘selectr’install.packages("selectr", dependencies = FALSE)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.2/selectr_0.3-1.zip'
Content type 'application/zip' length 159942 bytes (156 KB)
downloaded 156 KB
package ‘selectr’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Javier\AppData\Local\Temp\RtmpoP2z55\downloaded_packages
> standings_on_date_bref("2015-08-01", "NL East", from = FALSE)
$NL East
Tm W L W-L% GB RS RA pythW-L%
1 WSN 54 48 0.529 -- 422 391 0.535
2 NYM 54 50 0.519 1.0 368 373 0.494
3 ATL 46 58 0.442 9.0 379 449 0.423
4 MIA 42 62 0.404 13.0 370 408 0.455
5 PHI 41 64 0.390 14.5 386 511 0.374
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.