remix / partridge Goto Github PK
View Code? Open in Web Editor NEWA fast, forgiving GTFS reader built on pandas DataFrames
Home Page: https://partridge.readthedocs.io
License: MIT License
A fast, forgiving GTFS reader built on pandas DataFrames
Home Page: https://partridge.readthedocs.io
License: MIT License
Describe what you were trying to get done.
I am trying to build a representative graph of my GTFS network for a particular day. I have selected a particular date and loaded my feed, like this:
service_ids = service_ids_by_date[datetime.date(2019, 6, 29)]
feed = ptg.load_feed(path, view={
'trips.txt': {
'service_id': service_ids,
},
})
Tell us what happened, what went wrong, and what you expected to happen.
I expected all of my feed data to correspond to the service id I selected. However the I looked at the stop_times and it includes trip_ids that are not part of the service id.
Therefore I think that the load_feed function and the views work differently than I was expecting. It seems like they just filter the specific txt file that is being specified in the views.
Do you know if there is a way to load a feed that will only load stops that correspond to the trips that are part of the specified service_id?
Let's get Read the Docs building the documentation and then we can link to an online version.
Seen in Singapore GTFS.
Times go ["23:56:00", "00:02:00"...], and ideally times would be normalized to ["23:56:00", "24:02:00"...].
However, I'm not sure how prevalent this problem is. Any thoughts?
The filtering that takes place right now omits parent_station
s from stops
since they don't have any actual stops.
I think these shouldn't be filtered out.
import partridge as ptg
service_ids_by_date = ptg.read_service_ids_by_date(LOCAL_ZIP_PATH)
service_ids = service_ids_by_date[datetime.date(2017, 12, 21)]
feed = ptg.feed(LOCAL_ZIP_PATH, view={
'trips.txt': {
'service_id': service_ids,
},
})
s = feed.stops
s[s.stop_id.isin(s.parent_station.value_counts().head(10).index)]
This gave back an empty DataFrame, but once I took s.parent_station.value_counts().head(10).index)
and used it to slice a DataFrame built by reading the stops.txt
csv, I got them all back.
Any chance of publishing a new release incorporating the fix for the cchardet
support (#73)?
Would be amazing. Thanks in advance!
Writing a feed outputs time columns in seconds since midnight. They should be formatted as 'HH:MM:SS'.
read_service_ids_by_date
broke on our GTFS feed in 0.6.0. Up to 0.5.0 this worked perfectly fine.
import partridge as ptg
#local_zip_path - taken from here ftp://gtfs.mot.gov.il/israel-public-transportation.zip
service_ids_by_date = ptg.read_service_ids_by_date(local_zip_path)
Throws this error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-07ad157544c4> in <module>()
1 import partridge as ptg
2
----> 3 service_ids_by_date = ptg.read_service_ids_by_date(local_zip_path)
4 next_thursday = None #TODO: add this instead of hardcoding the date
5 service_ids = service_ids_by_date[datetime.date(2018, 3, 1)]
~\Anaconda3\lib\site-packages\partridge\readers.py in read_service_ids_by_date(path)
37 '''Find all service identifiers by date'''
38 feed = raw_feed(path)
---> 39 return _service_ids_by_date(feed)
40
41
~\Anaconda3\lib\site-packages\partridge\readers.py in _service_ids_by_date(feed)
63 # Only consider calendar.txt/calendar_dates.txt rows with applicable trips
64 calendar = feed.calendar[feed.calendar.service_id.isin(service_ids)].copy()
---> 65 caldates = feed.calendar_dates[feed.calendar_dates.service_id.isin(service_ids)].copy() # noqa E501
66
67 if not calendar.empty:
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
3612 if name in self._info_axis:
3613 return self[name]
-> 3614 return object.__getattribute__(self, name)
3615
3616 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'service_id'
Code stoppe running properly due to package error
line 102, in _read_csv df = pd.read_csv(path, dtype=np.unicode, encoding=encoding, index_col=False) File "/home/site/wwwroot/.python_packages/lib/site-packages/numpy/init.py",
line 284, in getattr raise AttributeError("module {!r} has no attribute "
I am running the commands
feed = ptg.load_feed(gtfs_filename)
return feed.routes.route_type.isin([3,4,5]).any()
on the file https://transitfeeds.com/p/traveline/1033/latest/download
which contains:
route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color,route_sort_order
21-L16-U-y05-51704,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51705,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51706,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51707,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51708,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L40-U-y05-51801,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L40-U-y05-51802,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L40-U-y05-51803,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L40-U-y05-51804,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L50-U-y05-51624,OId_CX,L1,"Seven Sisters - Bruce Grove - White Hart Lane - Edmonton Green - Enfield Town",3,,,,"Seven Sisters - Bruce Grove - White Hart Lane - Edmonton Green - Enfield Town"
21-N3-_-y05-49270,OId_CX,N3,"Oxford Circus - Crystal Palace - Bromley North",3,,,,"Oxford Circus - Crystal Palace - Bromley North"
21-N68-_-y05-46245,OId_CX,N68,"Old Coulsdon - Croydon - Norwood - Herne Hill - Camberwell - Waterloo - Tottenham Court Road",3,,,,"Old Coulsdon - Croydon - Norwood - Herne Hill - Camberwell - Waterloo - Tottenham Court Road"
21-P13-_-y05-51679,OId_CX,P13,"Streatham - Peckham - New Cross Gate",3,,,,"Streatham - Peckham - New Cross Gate"
21-R70-_-y05-50173,OId_TE,R70,"Hampton, The Avenue - Hampton - Fulwell - Twickenham - Richmond, Manor Circus",3,,,,"Hampton, The Avenue - Hampton - Fulwell - Twickenham - Richmond, Manor Circus"
21-S4-_-y05-50957,OId_CX,S4,"Roundshaw - Sutton - St Helier",3,,,,"Roundshaw - Sutton - St Helier"
21-U5-_-y05-49629,OId_TE,U5,"Uxbridge - Cowley - Hillingdon Hospital - West Drayton - Stockley Park - Hayes & Harlington Station",3,,,,"Uxbridge - Cowley - Hillingdon Hospital - West Drayton - Stockley Park - Hayes & Harlington Station"
21-U7-_-y05-49631,OId_TE,U7,"Uxbridge - Hillingdon Hospital - Charville School - Hayes, Sainsbury's",3,,,,"Uxbridge - Hillingdon Hospital - Charville School - Hayes, Sainsbury's"
21-U9-_-y05-49448,OId_TE,U9,"Uxbridge - Ickenham - Harefield",3,,,,"Uxbridge - Ickenham - Harefield"
25-DLR-_-y05-10,OId_DLR,DLR,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-11,OId_DLR,DLR,"Bank/Tower Gateway/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-13,OId_DLR,DLR,"Tower Gateway/Stratford International/Bow Church - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Tower Gateway/Stratford International/Bow Church - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-16,OId_DLR,DLR,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-17,OId_DLR,DLR,"Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-18,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-19,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford Interntional - Beckton/Woolwich Arsenal // Island Gardens - Lewisham",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford Interntional - Beckton/Woolwich Arsenal // Island Gardens - Lewisham"
25-DLR-_-y05-20,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-3,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-8,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
30-CR-_-y05-19,OId_CRC,Circular Cruise Westminster,"Westminster - St Katharines - Westminster",4,,,,"Westminster - St Katharines - Westminster"
31-TRS-_-y05-20,OId_TRS,Thames River Services,"Westminster - St Katherine's - Greenwich",4,,,,"Westminster - St Katherine's - Greenwich"
32-CCR-_-y05-24,OId_CCR,City Cruises,"Westminster - Greenwich",4,,,,"Westminster - Greenwich"
33-RB1-_-y05-26,OId_CV,RB1,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Battersea Power Station",4,,,,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Battersea Power Station"
33-RB1-X-y05-9,OId_CV,RB1X,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Westminster",4,,,,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Westminster"
33-RB2-_-y05-17,OId_CV,RB2,"London Bridge - Embankment - Bankside",4,,,,"London Bridge - Embankment - Bankside"
Note that the route_type
column contains a string undefined
.
This causes a crash:
Traceback (most recent call last):
File "pandas/_libs/lib.pyx", line 1897, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "undefined"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./src/pull_gtfs.py", line 271, in <module>
ValidateFeeds(feeds_db_filename, data_template_name, parsed_prefix)
File "./src/pull_gtfs.py", line 242, in ValidateFeeds
fm.validate_feeds(parsed_prefix)
File "./src/pull_gtfs.py", line 206, in validate_feeds
self.validate_feed(fid, parsed_prefix)
File "./src/pull_gtfs.py", line 190, in validate_feed
elif not parse_gtfs.HasBusRoutes(filename):
File "/home/rick/projects/RISE_ev_bus/dispatch/src/parse_gtfs.py", line 365, in HasBusRoutes
return feed.routes.route_type.isin(route_types).any()
File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/partridge/gtfs.py", line 16, in getter
return self.get(filename)
File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/partridge/gtfs.py", line 51, in get
self._convert_types(filename, df)
File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/partridge/gtfs.py", line 164, in _convert_types
df[col] = converter(df[col])
File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 151, in to_numeric
values, set(), coerce_numeric=coerce_numeric
File "pandas/_libs/lib.pyx", line 1934, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "undefined" at position 1804
Passing the above data through pandas directly:
#!/usr/bin/env python3
import pandas as pd
a=pd.read_csv("routes.txt")
route_types = [3,4,5]
a.route_type.isin(route_types).any()
does not result in an error.
Therefore, partridge should probably do some kind of sanitizing to ensure that the route_type
column is actually numeric.
I would like to have the option for creating a valid pruned GTFS feed based on a partridge feed.
I tried to use write_feed_dangerously
, but:
Thanks
I started working with partridge, and thought it could be helpful saving feeds and caches to a local store (HDF5) for easy access between runs of python. What's your thoughts?
Adding an option to get "trip id by day", considering a custom "service day" hours (not necessarily midnight to midnight).
"Service day" definition may vary between applications.
For example, it is more convenient to define a service day of 4AM to 4AM of the next date, in order to analyze GTFS for working days.
To answer the question: "When is the earliest departure of route x at date y" it might be irrelevant that there is a departure of route x on 00:30 AM of date y.
An option for implementation is to filter trip_ids by something similar to the following method, copied from Paul Harrington's post on https://groups.google.com/forum/#!topic/transit-developers/ZkfnuNv1gho :
"When searching for next departures at a stop at a time between midnight and 3am I rolled back a day and added 24 hours to the hour so instead of running a query for departures at a stop starting from a date of 20180124 and a time of 01:00:00 I would use 20180123 and 25:00:00"
The usage could be something like adding "read_trip_ids_by_day" method, which is similar to "read_service_ids_by_date", but also gets "day_start" and "day_end" hours as inputs.
Then we would be able to filter the "feed" by these trip ids.
import datetime
import partridge as ptg
path = 'path/to/sfmta-2017-08-22.zip'
trip_ids_by_day = ptg.read_trip_ids_by_day(path, day_start="03:00:00", day_end="02:59:59")
trip_ids = trip_ids_by_day[datetime.date(2017, 9, 25)]
# Now "trip_ids" should contain all the trip_ids of trips that start
# between 25-09-2017 03:00:00 and 26-09-2017 02:59:59
feed = ptg.feed(path, view={
'trips.txt': {
'trip_id': trip_ids,
},
})
I was trying to use partridge, first from peartree, then directly, in a mamba/conda environment with numpy 1.25.
I ran into the same issue documented by another peartree user here:
The code available on pip still uses np.unicode in gtfs.py, which was deprecated in numpy 1.20 and removed in numpy 1.24.
I was able to fix this for myself by replacing the pip version of partridge with the repository version because the repository version appears to have already fixed this bug.
So you could make many people's life a tiny bit easier by releasing the bug-fixed version on pip :)
file = "my_awesome_gtfs_data.zip"
ptg.read_busiest_date(file)
Traceback: See https://github.com/kuanb/peartree/issues/178 (my trace was the same in all important respects)
Related (or maybe the same issue) is that GTFS specifies that times are seconds since noon minus 12 hours. For calendar days when there is a daylight savings shift, the float representation of the parsed time is currently incorrect.
Originally posted by @tilgovi in #42 (comment)
Describe what you were trying to get done.
partridge
library and observe the busiest day of service(especially for buses)Tell us what happened, what went wrong, and what you expected to happen.
service_id
s in the busiest day (output of read_busiest_date
) was missing several service_id
that were actually in service on that particular day.read_busiest_date
to output all the service_id
operational in that particular day but it only seems to include the service_id
listed in the calendar_dates.txt
. The GTFS documentation has the following description for calendar_dates.txt
: "Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then calendar_dates.txt is required and must contain all dates of service."calendar.txt
and calendar_dates.txt
, I believe the read_busiest_date
is considering only exceptions in the service_id
ignoring the regular service_id
.service_id
s in calendar.txt
with 0
for Mon- Sun. However, this might lead to incorrect results while using read_busiest_date
as the function excludes these regular service_id
s with 0
s. An example of this can be seen below.Example:
import partridge as ptg
ptg.read_busiest_date('gtfs_Portland_2022_feb1.zip')
Output: (datetime.date(2022, 1, 24), frozenset({'A.613', 'D.613', 'Q.613', 'W.613'}))
service_ids
are missing in the output : [B.613
,C.613
,F.613
,E.613
,U.613
,S.613
]service_ids
include both Light Rail & Bus route_type
. For example B.613
(Light Rail) consists of the route 'MAX Red Line' that operates Monday-Friday & Weekends. This is perhaps most busiest line as it connects the Portland Airport. Also U.613
(Bus) consists of 48 routes. All the routes in this service_id
can be seen [here].(http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/U.613)Writing a feed results in date columns represented as '%Y-%m-%d' where they should be '%Y%m%d'. The reader code correctly parses dates in the feeds, but the writer does not write them in the same format.
I'm working on using partridge to speed up statistical summarization of a historical archive of feeds, so I made some kind of a mashup between partridge and GTFSTK, you can see an initial version here.
I profiled the run and noticed that detect_encoding
is responsible for about half of partridge's run time. I am sure this could be alleviated somehow by specifying the encoding if it is known upfront.
cChardet appears to be an abandoned project and is not compatible with python 3.10 (without compiling from source).
As a result installing partridge via pip fails
#18 96.96 Running setup.py install for cchardet: finished with status 'error'
#18 96.97 error: subprocess-exited-with-error
#18 96.97
#18 96.97 × Running setup.py install for cchardet did not run successfully.
#18 96.97 │ exit code: 1
#18 96.97 ╰─> [24 lines of output]
#18 96.97 running install
#18 96.97 running build
#18 96.97 running build_py
#18 96.97 creating build
#18 96.97 creating build/lib.linux-x86_64-3.10
#18 96.97 creating build/lib.linux-x86_64-3.10/cchardet
#18 96.97 copying src/cchardet/__init__.py -> build/lib.linux-x86_64-3.10/cchardet
#18 96.97 copying src/cchardet/version.py -> build/lib.linux-x86_64-3.10/cchardet
#18 96.97 running build_ext
#18 96.97 building 'cchardet._cchardet' extension
#18 96.97 creating build/temp.linux-x86_64-3.10
#18 96.97 creating build/temp.linux-x86_64-3.10/src
#18 96.97 creating build/temp.linux-x86_64-3.10/src/cchardet
#18 96.97 creating build/temp.linux-x86_64-3.10/src/ext
#18 96.97 creating build/temp.linux-x86_64-3.10/src/ext/uchardet
#18 96.97 creating build/temp.linux-x86_64-3.10/src/ext/uchardet/src
#18 96.97 creating build/temp.linux-x86_64-3.10/src/ext/uchardet/src/LangModels
#18 96.97 gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Isrc/ext/uchardet/src -I/usr/local/include/python3.10 -c src/cchardet/_cchardet.cpp -o build/temp.linux-x86_64-3.10/src/cchardet/_cchardet.o
#18 96.97 src/cchardet/_cchardet.cpp: In function ‘int __Pyx_modinit_type_init_code()’:
#18 96.97 src/cchardet/_cchardet.cpp:2466:53: error: ‘PyTypeObject’ {aka ‘struct _typeobject’} has no member named ‘tp_print’; did you mean ‘tp_dict’?
#18 96.97 __pyx_type_8cchardet_9_cchardet_UniversalDetector.tp_print = 0;
#18 96.97 ^~~~~~~~
#18 96.97 tp_dict
#18 96.97 error: command '/usr/bin/gcc' failed with exit code 1
#18 96.97 [end of output]
#18 96.97
#18 96.97 note: This error originates from a subprocess, and is likely not a problem with pip.
#18 96.97 error: legacy-install-failure
#18 96.97
#18 96.97 × Encountered error while trying to install package.
#18 96.97 ╰─> cchardet
It seems like switching back to chardet or optionally using chardet when on python 3.10+ would be potential alternatives.
As is, it appears that any non-numeric ID field will crash Partridge because Pandas is expecting IDs to be integers. Unfortunately the GTFS spec states that a valid ID "is a sequence of any UTF-8 characters", and in practice the use of non-numeric IDs is widespread.
import partridge
partridge.read_busiest_date("gtfs.zip")
# For example - issue is not limited to this function
Traceback (most recent call last):
File "pandas\_libs\parsers.pyx", line 1050, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
partridge.read_busiest_date("gtfs.zip")
File "...\partridge\readers.py", line 60, in read_busiest_date
return _busiest_date(feed)
File "...\partridge\readers.py", line 118, in _busiest_date
service_ids_by_date = _service_ids_by_date(feed)
File "...\partridge\readers.py", line 156, in _service_ids_by_date
service_ids = set(feed.trips.service_id)
File "...\partridge\gtfs.py", line 16, in getter
return self.get(filename)
File "...\partridge\gtfs.py", line 48, in get
df = self._read(filename)
File "...\partridge\gtfs.py", line 48, in get
df = self._read(filename)
File "...\partridge\gtfs.py", line 102, in _read_csv
df = pd.read_csv(path, dtype=np.unicode, encoding=encoding, index_col=False)
File "...\pandas\io\parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "...\pandas\io\parsers.py", line 463, in _read
return parser.read(nrows)
File "...\pandas\io\parsers.py", line 1052, in read
index, columns, col_dict = self._engine.read(nrows)
File "...\pandas\io\parsers.py", line 2056, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 850, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 982, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1056, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'offending_string_id'
Is there a recipe or notebook for obtaining service frequencies or average headways by route for a give time period (say 6am-9am)?
I looked at the notebooks in the wiki and could not find what I was looking for.
Thank you!
Downloading and navigating to a given GTFS feed for use in Partridge can be cumbersome and risky due to internet speeds and version control issues. A function to pull a fresh feed from a URL (e.g., from an agency's developer portal) would help with this.
See url2gtfs
in https://davidabailey.com/articles/Visualizing-Public-Transportation-Speeds-with-Python for an example.
Greetings from the American Planning Association! We're in San Francisco for our National Planning Conference. One component of this is a DataJam meant to get planners and civic tech people working together to try to solve some San Francisco transit problems.
I'm putting together some JupyterHub-hosted notebooks bootstrapped with data so that people can get working with data right away and not fritter away valuable time messing with getting environments and dependencies set up, ETL-ing and cleaning data, etc. partridge
looks like an awesome resource for working with GTFS data efficiently without having to deploy a database. We'd love to get DataJam attendees using it as one possible method for producing awesome transit analyses and visualizations.
The geopandas
integration looks like it hasn't made it to pypi yet. There's no full
in setup.py
extras; geo.py
is missing and the geopandas
dependency was not installed.
Installing using pip
's VCS method from this repo works just fine, e.g. pip install -e git+https://github.com/remix/partridge/#egg=partridge
, and I've used that to get it working in our hosted JupyterHub on Azure Notebooks so DataJam attendees can use it without problems during the event. Just thought you'd want the heads up!
pip install "partridge[full]"
Collecting partridge[full]
Using cached https://files.pythonhosted.org/packages/19/a6/3a26b6ffc3a317248a279f9c0057ae92e9f3432af50af8d233c217d80de3/partridge-1.0.0-py2.py3-none-any.whl
partridge 1.0.0 does not provide the extra 'full'
[extraneous output ommitted]
I tried to change the types of the _id
columns (i.e. route_id
) in some table from dtype object to numeric, to lower the memory usage. I did that by adding a converter to the default config.
It went fine at, but the DataFrames came back empty. I looked into that a little bit and I think it is because the read_file method does the prune
part before the type conversion, causing the comparison of object column (the column in the current table) with numeric column (from the dependency table that is type converted).
I'm not sure what would be the right solution for that, maybe changing both columns to object before comparison.
In[1]: import partridge as ptg
In[2]: conf = ptg.config.default_config()
In[3]: ptg.load_feed(path, config=conf)
Out[3]:
route_id agency_id route_short_name ... route_desc route_type route_color
0 1 25 1 ... 67001-1-# 3 NaN
1 2 25 1 ... 67001-2-# 3 NaN
2 3 25 2 ... 56002-1-# 3 NaN
[3 rows x 7 columns]
In[4]: import pandas as pd
In[5]: conf.nodes['trips.txt']['converters']['route_id'] = pd.to_numeric
In[6]: conf.nodes['routes.txt']['converters']['route_id'] = pd.to_numeric
In[7]: ptg.load_feed(path, config=conf).routes
Out[7]:
Empty DataFrame
Columns: [route_id, agency_id, route_short_name, route_long_name, route_desc, route_type, route_color]
Index: []
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.