r-transit / tidytransit Goto Github PK
View Code? Open in Web Editor NEWR package for working with GTFS data
Home Page: https://r-transit.github.io/tidytransit/
R package for working with GTFS data
Home Page: https://r-transit.github.io/tidytransit/
I think we can find a way to rewrite the functions so that there's a parameter in function calls for sf=TRUE
. Otherwise, we can just show lat and long in the dataframes that are returned, because that's what's in GTFS data anyway.
Will look into this later.
there have been enough changes that its about time
route_type is useful for analysis and can be easily pulled from the routes_df
type descriptions:
https://gist.github.com/derhuerst/b0243339e22c310bee2386388151e11e
deprecated types:
https://sites.google.com/site/gtfschanges/proposals/route-type
Sorry for potentially naive question but can this route from A to B with gtfs data?
Context: @mem48 and I are currently using OpenTripPlanner for this but it has quite a lot of overheads, but is very good for multi-modal routing (perhaps one day that will be possible in R).
While working on #6 I came across specifications for files that are not defined in gtfs reference like directions and stop_attributes.
I guess there's is an extended or additional specification I'm not aware of?
from rstudio:
This is an automated email to let you know that:
A new version of dplyr is ready to go to CRAN. dplyr is
currently at version 0.7.8 and will become 0.8.0 upon release.
tidytransit uses dplyr and has problems with the new version.
We plan to submit dplyr to CRAN on February 1.
This release represents about 9 months of development, detailed in
this blog post:
https://www.tidyverse.org/articles/2018/12/dplyr-0-8-0-release-candidate/
I need your help to keep tidytransit and dplyr working together
smoothly. In the next weeks, can you please:
Read about the changes to dplyr at
https://github.com/tidyverse/dplyr/blob/master/NEWS.md#dplyr-080.
This page includes a list of breaking changes, the reasoning behind
them, and to how to update your code.
Carefully inspect the failing checks listed at the bottom of this email.
For each failing check, either update your package, or tell me
that I have a bug. If you have made changes to your package, please
submit an update to CRAN before February 1.
If you have discovered a bug in dplyr, please file an issue (ideally
with a small reprex that illustrates the problem) at
https://github.com/tidyverse/dplyr/issues. If you're not sure whether
or not you've found a bug, please an issue and we'll help you figure
it out. Breaking changes that are not listed qualify as bugs.
Please respond to this message if you have any questions.
Thanks,
Romain Francois
== CHECK RESULTS ========================================
Running examples in ‘tidytransit-Ex.R’ failed
The error most likely occurred in:
data(gtfs_obj)
gtfs_obj <- get_route_frequency(gtfs_obj)
Calculating route and stop headways using defaults (6 am to 10 pm
for weekday service).
Error in n() : could not find function "n"
Calls: get_route_frequency ... -> ->
mutate.tbl_df -> mutate_impl
Execution halted
```
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
16: `_fseq`(`_lhs`)
17: freduce(value, `_function_list`)
18: function_list[[i]](value)
19: dplyr::mutate(., service_trips = n())
20: mutate.tbl_df(., service_trips = n()) at
/Users/romain/git/tidyverse/dplyr/R/manip.r:416
21: mutate_impl(.data, dots) at
/Users/romain/git/tidyverse/dplyr/R/tbl-df.r:91
══ testthat results
═══════════════════════════════════════════════════════════
OK: 3 SKIPPED: 7 FAILED: 3
1. Error: Stop frequencies (headways) for included data are as
expected (@test_headways.R#4)
2. Error: Route frequencies (headways) for included data are as
expected (@test_headways.R#11)
3. Error: Route frequencies (headways) can be calculated for
included data for a particular service id (@test_headways.R#17)
Error: testthat unit tests failed
Execution halted
```
this would be a good way to estimate how its working.
it would be nice to be able to just:
gtfs_feed <- read_gtfs(name="Translink")
Hello everyone I am trying to plot the frecuency of buses that pass trough each stop.
The library suppose to add these data when I write frecuency true like below:
gtfs <-read_gtfs("gtfs.zip", local=TRUE, geometry = TRUE, frequency=TRUE)
Theorically I will get a new data with the frecuency of buses per stop. However I get a error in the console that says :
Warning message:
In get_route_frequency(gtfs_obj) : failed to calculate frequency--
try passing a service_id from calendar_df
becasue this I cant get that ifnormation (I get the data frame but without that data), I have also try to get it using "get_stop_frequency" without luck.
Can someone help me ?
Thank you all
so that they are just immediately available.
Hey, I appreciate your effort to separate the different "modules" and I'm looking forward to seeing where this package might end up. As I don't know much about developing R packages, I don't yet understand how this repository and the others (trread, ...) are connected? Is it the same code duplicated and separate or is it automatically cloned/imported in some way?
When using the google sample feed, the resulting stop_times_df looks like
$stop_times_df
# A tibble: 28 x 9
trip_id arrival_time departure_time stop_id stop_sequence stop_headsign pickup_type X8 shape_dist_traveled
<chr> <chr> <chr> <chr> <int> <chr> <int> <chr> <dbl>
1 STBA 6:00:00 6:00:00 STAGECOACH 1 NA NA NA NA
2 STBA 6:20:00 6:20:00 BEATTY_AIRPORT 2 NA NA NA NA
3 CITY1 6:00:00 6:00:00 STAGECOACH 1 NA NA NA NA
4 CITY1 6:05:00 6:07:00 NANAA 2 NA NA NA NA
5 CITY1 6:12:00 6:14:00 NADAV 3 NA NA NA NA
6 CITY1 6:19:00 6:21:00 DADAN 4 NA NA NA NA
7 CITY1 6:26:00 6:28:00 EMSI 5 NA NA NA NA
8 CITY2 6:28:00 6:30:00 EMSI 1 NA NA NA NA
9 CITY2 6:35:00 6:37:00 DADAN 2 NA NA NA NA
10 CITY2 6:42:00 6:44:00 NADAV 3 NA NA NA NA
The problem seems to be that the column "drop_off_time" is not a valid column name which is pretty strange for an example feed. Also there are commas missing from line 17 on but that's not the point.
My question is: Why are only required/expected columns read in import.R#348? Why don't we simply read the whole file as a simple csv and check validity afterwards? The column X8 we get isn't really helpful anyways.
Great idea for a package! However I am noticing some issues with the output on my first use, looking at L train routes in Chicago. A reproducible example:
library(tidytransit)
library(mapview)
chicago_gtfs <- read_gtfs("http://www.transitchicago.com/downloads/sch_data/google_transit.zip")
routes <- chicago_gtfs$routes_sf
mapview(routes[routes$route_id == "Pink", ])
This is the Yellow Line, not the Pink Line. I'm wondering if some of the rows are getting shuffled when they are converted to simple features? I've forked and will go through the code but let me know if you have any suggestions. Thanks!
default frequency calculation fails if the service schedule isn't something like this kind of weekday (1,1,1,1,1,0,0). but services are potentially specified in broader ways.
it should default to something less restrictive by default and just calculate frequency for whatever it can.
to more closely mirror api's like read_csv
report of memory limitations. unclear what feed:
these might be more generically useful than just GTFS:
https://github.com/r-transit/tidytransit/blob/master/R/import.R#L180-L241
https://github.com/r-transit/tidytransit/blob/master/R/import.R#L301-L316
something like:
dataframes <- read_zip("zip_of_csvs.zip")
see #72
in this case, the message suggests filtering by service id, but that doesn't help.
Since the version on CRAN is a bit different, in terms of functions available. We want people to keep up with our newest developments!
hey @mpadge is this the right issue title for your question about how to manage "merging" this package with other packages?
i think tidytransit ended up being more oriented toward users than developers, partially by the prompting of @angela-li to consider that a user would not want to have to import and think about multiple packages to just do basic mapping and frequency/schedule analysis.
i think that intuition was right, but as you work on gtfs-router i expect you'll develop better approaches across a number of problems.
another way to think about this issue is how to deprecate this package gracefully as you advance your work on gtfs-router. could just be managed by having a similar api.
make routes_df_as_sf
match the naming of get_route_frequency
. this would be more logical/easier to understand. so, for example, get_route_geometry
.
It's always good to have something explicit! Use ggplot2 CONTRIBUTING.md and sf CONDUCT.md as examples.
Based on this stackoverflow question.
local_gtfs_path <- system.file("extdata",
"google_transit_nyc_subway.zip",
package = "tidytransit")
nyc <- read_gtfs(local_gtfs_path,
local=TRUE)
plot(nyc)
with the plot function:
tidytransit:::plot.gtfs <- function (x, ...) {
dots = list(...)
routes_sf_frequencies <- x$routes_sf %>% dplyr::inner_join(x$routes_frequency_df,
by = "route_id") %>% dplyr::select(median_headways, mean_headways,
st_dev_headways, stop_count)
plot(routes_sf_frequencies)
}
The problem seems to be twofold: routes_sf is missing by default and the headway calculations haven't been done.
Is there a compelling reason to use the _df suffix for the data frames (e.g. gtfs$stops_df
)?
perhaps as its own function.
this is confusing:
https://github.com/r-transit/tidytransit/blob/master/R/frequencies.R#L34-L59
see #72
based on #15 its clear that we need better examples to review and test functionality.
a vignette using tidycensus might be great for this.
this should make contribution simpler.
as discussed here perhaps prefixing them with a gtfs_*
library(tidytransit)
library (magrittr)
f <- list.files (getwd (), full.names = TRUE)
filename <- f [grep ("VBB", f)] # GTFS for Berlin-Brandenburg Transport - it's huge!
get_df <- function (filename)
{
flist <- file.path (utils::unzip (filename, list = TRUE)$Name)
res <- list ()
for (i in seq (flist))
{
cmd <- paste0 ("unzip -p \"", filename, "\" \"", flist [i], "\"")
res [[i]] <- data.table::fread (cmd = cmd, showProgress = FALSE) %>%
as.data.frame ()
}
names (res) <- strsplit (flist, ".txt")
return (res)
}
rbenchmark::benchmark (
dat <- read_gtfs (filename, local = TRUE),
dat <- get_df (filename),
replications = 1)
#> test replications elapsed relative user.self sys.self user.child sys.child
#> 2 dat <- get_df(filename) 1 3.463 1.00 6.678 0.432 1.745 0.312
#> 1 dat <- read_gtfs(filename, local = TRUE) 1 31.411 9.07 30.701 0.645 0.000 0.000
Created on 2019-02-01 by the reprex package (v0.2.1)
GTFS feeds can be enormous, and data.table
makes a pretty huge difference - it'll read a feed nearly ten times faster!
This is also by way of starting a separate conversation about the potential future merging of gtfs-router
into this package. It seems like the obvious place for it, and the primary usage for tidytransit
if it were available is surely likely to be transit routing? You could then check out your transit options from within the comfort of your R session!
it should be relatively easy to implement a simpler version of going from import_gtfs to plot(gtfs_obj) and would make this kind of workflow more intuitive.
see here for more on why this matters: 4af025e
@tbuckl Should we change the package description on github? I don't think the package being sf compatible is its main focus. I'd suggest something like on tidytransit.r-transit.org:
"tidytransit reads the General Transit Feed Specification (GTFS) into tidyverse and simple features dataframes. Use tidytransit to map transit stops and routes, calculate transit frequencies, and validate transit feeds."
as per the suggestions in #42
it could be that there's a more intuitive/descriptive way of naming these dataframes.
@mpadge if there's anything that would make this more usable for you let me know.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.