remix / partridge Goto Github PK

View Code? Open in Web Editor NEW

149.0 149.0 22.0 3.03 MB

A fast, forgiving GTFS reader built on pandas DataFrames

Home Page: https://partridge.readthedocs.io

License: MIT License

Python 95.36% Makefile 4.64%

gtfs pandas python

partridge's People

Contributors

Stargazers

Watchers

partridge's Issues

load stops that correspond to the trips that are part of the specified service_id

partridge version: 1.1.1
Python version: 3.7.7
Operating System: Windows/Anaconda

Description

Describe what you were trying to get done.

I am trying to build a representative graph of my GTFS network for a particular day. I have selected a particular date and loaded my feed, like this:

service_ids = service_ids_by_date[datetime.date(2019, 6, 29)]
feed = ptg.load_feed(path, view={
    'trips.txt': {
        'service_id': service_ids,
    },
})

Tell us what happened, what went wrong, and what you expected to happen.

I expected all of my feed data to correspond to the service id I selected. However the I looked at the stop_times and it includes trip_ids that are not part of the service id.

Therefore I think that the load_feed function and the views work differently than I was expecting. It seems like they just filter the specific txt file that is being specified in the views.

Do you know if there is a way to load a feed that will only load stops that correspond to the trips that are part of the specified service_id?

Read the Docs

Let's get Read the Docs building the documentation and then we can link to an online version.

Handle hours resetting at midnight

partridge version: 0.11.0
Python version: 3.6

Description

Seen in Singapore GTFS.

Times go ["23:56:00", "00:02:00"...], and ideally times would be normalized to ["23:56:00", "24:02:00"...].

However, I'm not sure how prevalent this problem is. Any thoughts?

Parent stations missing from feed

partridge version: 0.3.0
Python version: 3.6

Description

The filtering that takes place right now omits parent_stations from stops since they don't have any actual stops.

I think these shouldn't be filtered out.

What I Did

import partridge as ptg

service_ids_by_date = ptg.read_service_ids_by_date(LOCAL_ZIP_PATH)
service_ids = service_ids_by_date[datetime.date(2017, 12, 21)]

feed = ptg.feed(LOCAL_ZIP_PATH, view={
    'trips.txt': {
        'service_id': service_ids,
    },
})

s = feed.stops
s[s.stop_id.isin(s.parent_station.value_counts().head(10).index)]

This gave back an empty DataFrame, but once I took s.parent_station.value_counts().head(10).index) and used it to slice a DataFrame built by reading the stops.txt csv, I got them all back.

New v1.1.2 release including cchardet dependency fix

Any chance of publishing a new release incorporating the fix for the cchardet support (#73)?

Would be amazing. Thanks in advance!

Output time format is incorrect

partridge version: v1.0.0
Python version: v3.7.1
Operating System: macOS

Description

Writing a feed outputs time columns in seconds since midnight. They should be formatted as 'HH:MM:SS'.

read_service_ids_by_date AttributeError in 0.6.0

partridge version: 0.6.0
Python version: 3.6

Description

read_service_ids_by_date broke on our GTFS feed in 0.6.0. Up to 0.5.0 this worked perfectly fine.

What I Did

import partridge as ptg
#local_zip_path - taken from here ftp://gtfs.mot.gov.il/israel-public-transportation.zip
service_ids_by_date = ptg.read_service_ids_by_date(local_zip_path)

Throws this error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-07ad157544c4> in <module>()
      1 import partridge as ptg
      2 
----> 3 service_ids_by_date = ptg.read_service_ids_by_date(local_zip_path)
      4 next_thursday = None #TODO: add this instead of hardcoding the date
      5 service_ids = service_ids_by_date[datetime.date(2018, 3, 1)]

~\Anaconda3\lib\site-packages\partridge\readers.py in read_service_ids_by_date(path)
     37     '''Find all service identifiers by date'''
     38     feed = raw_feed(path)
---> 39     return _service_ids_by_date(feed)
     40 
     41 

~\Anaconda3\lib\site-packages\partridge\readers.py in _service_ids_by_date(feed)
     63     # Only consider calendar.txt/calendar_dates.txt rows with applicable trips
     64     calendar = feed.calendar[feed.calendar.service_id.isin(service_ids)].copy()
---> 65     caldates = feed.calendar_dates[feed.calendar_dates.service_id.isin(service_ids)].copy() # noqa E501
     66 
     67     if not calendar.empty:

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   3612             if name in self._info_axis:
   3613                 return self[name]
-> 3614             return object.__getattribute__(self, name)
   3615 
   3616     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'service_id'

numpy attribute issue in gtfs.py

partridge version:
Python version:
Operating System:

Description

Code stoppe running properly due to package error

What I Did

line 102, in _read_csv df = pd.read_csv(path, dtype=np.unicode, encoding=encoding, index_col=False) File "/home/site/wwwroot/.python_packages/lib/site-packages/numpy/init.py",
line 284, in getattr raise AttributeError("module {!r} has no attribute "

Crash on string-valued route_type

partridge version: 1.1.1
Python version: 3.7.5
Operating System: Lubuntu 19.10

Description

I am running the commands

feed = ptg.load_feed(gtfs_filename)
return feed.routes.route_type.isin([3,4,5]).any()

on the file https://transitfeeds.com/p/traveline/1033/latest/download which contains:

route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color,route_sort_order
21-L16-U-y05-51704,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51705,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51706,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51707,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L16-U-y05-51708,OId_CX,C,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford",3,,,,"Stratford - Manor Park - Ilford - Seven Kings - Goodmayes - Romford"
21-L40-U-y05-51801,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L40-U-y05-51802,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L40-U-y05-51803,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L40-U-y05-51804,OId_CX,L3,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford",3,,,,"Hackney Downs/Liverpool Street - Walthamstow Central - Chingford"
21-L50-U-y05-51624,OId_CX,L1,"Seven Sisters - Bruce Grove - White Hart Lane - Edmonton Green - Enfield Town",3,,,,"Seven Sisters - Bruce Grove - White Hart Lane - Edmonton Green - Enfield Town"
21-N3-_-y05-49270,OId_CX,N3,"Oxford Circus - Crystal Palace - Bromley North",3,,,,"Oxford Circus - Crystal Palace - Bromley North"
21-N68-_-y05-46245,OId_CX,N68,"Old Coulsdon - Croydon - Norwood - Herne Hill - Camberwell - Waterloo - Tottenham Court Road",3,,,,"Old Coulsdon - Croydon - Norwood - Herne Hill - Camberwell - Waterloo - Tottenham Court Road"
21-P13-_-y05-51679,OId_CX,P13,"Streatham - Peckham - New Cross Gate",3,,,,"Streatham - Peckham - New Cross Gate"
21-R70-_-y05-50173,OId_TE,R70,"Hampton, The Avenue - Hampton - Fulwell - Twickenham - Richmond, Manor Circus",3,,,,"Hampton, The Avenue - Hampton - Fulwell - Twickenham - Richmond, Manor Circus"
21-S4-_-y05-50957,OId_CX,S4,"Roundshaw - Sutton - St Helier",3,,,,"Roundshaw - Sutton - St Helier"
21-U5-_-y05-49629,OId_TE,U5,"Uxbridge - Cowley - Hillingdon Hospital - West Drayton - Stockley Park - Hayes & Harlington Station",3,,,,"Uxbridge - Cowley - Hillingdon Hospital - West Drayton - Stockley Park - Hayes & Harlington Station"
21-U7-_-y05-49631,OId_TE,U7,"Uxbridge - Hillingdon Hospital - Charville School - Hayes, Sainsbury's",3,,,,"Uxbridge - Hillingdon Hospital - Charville School - Hayes, Sainsbury's"
21-U9-_-y05-49448,OId_TE,U9,"Uxbridge - Ickenham - Harefield",3,,,,"Uxbridge - Ickenham - Harefield"
25-DLR-_-y05-10,OId_DLR,DLR,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-11,OId_DLR,DLR,"Bank/Tower Gateway/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-13,OId_DLR,DLR,"Tower Gateway/Stratford International/Bow Church - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Tower Gateway/Stratford International/Bow Church - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-16,OId_DLR,DLR,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-17,OId_DLR,DLR,"Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-18,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-19,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford Interntional - Beckton/Woolwich Arsenal // Island Gardens - Lewisham",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford Interntional - Beckton/Woolwich Arsenal // Island Gardens - Lewisham"
25-DLR-_-y05-20,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-3,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
25-DLR-_-y05-8,OId_DLR,DLR,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal",undefined,,,,"Bank/Tower Gateway/Stratford/Stratford International - Beckton/Lewisham/Woolwich Arsenal"
30-CR-_-y05-19,OId_CRC,Circular Cruise Westminster,"Westminster - St Katharines - Westminster",4,,,,"Westminster - St Katharines - Westminster"
31-TRS-_-y05-20,OId_TRS,Thames River Services,"Westminster - St Katherine's - Greenwich",4,,,,"Westminster - St Katherine's - Greenwich"
32-CCR-_-y05-24,OId_CCR,City Cruises,"Westminster - Greenwich",4,,,,"Westminster - Greenwich"
33-RB1-_-y05-26,OId_CV,RB1,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Battersea Power Station",4,,,,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Battersea Power Station"
33-RB1-X-y05-9,OId_CV,RB1X,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Westminster",4,,,,"Royal Arsenal Woolwich - Canary Wharf - Embankment - Westminster"
33-RB2-_-y05-17,OId_CV,RB2,"London Bridge - Embankment - Bankside",4,,,,"London Bridge - Embankment - Bankside"

Note that the route_type column contains a string undefined.

This causes a crash:

Traceback (most recent call last):
  File "pandas/_libs/lib.pyx", line 1897, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "undefined"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/pull_gtfs.py", line 271, in <module>
    ValidateFeeds(feeds_db_filename, data_template_name, parsed_prefix)
  File "./src/pull_gtfs.py", line 242, in ValidateFeeds
    fm.validate_feeds(parsed_prefix)
  File "./src/pull_gtfs.py", line 206, in validate_feeds
    self.validate_feed(fid, parsed_prefix)
  File "./src/pull_gtfs.py", line 190, in validate_feed
    elif not parse_gtfs.HasBusRoutes(filename):
  File "/home/rick/projects/RISE_ev_bus/dispatch/src/parse_gtfs.py", line 365, in HasBusRoutes
    return feed.routes.route_type.isin(route_types).any()
  File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/partridge/gtfs.py", line 16, in getter
    return self.get(filename)
  File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/partridge/gtfs.py", line 51, in get
    self._convert_types(filename, df)
  File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/partridge/gtfs.py", line 164, in _convert_types
    df[col] = converter(df[col])
  File "/home/rick/projects/RISE_ev_bus/dispatch/env/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 151, in to_numeric
    values, set(), coerce_numeric=coerce_numeric
  File "pandas/_libs/lib.pyx", line 1934, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "undefined" at position 1804

Passing the above data through pandas directly:

#!/usr/bin/env python3
import pandas as pd
a=pd.read_csv("routes.txt")
route_types = [3,4,5]
a.route_type.isin(route_types).any()

does not result in an error.

Therefore, partridge should probably do some kind of sanitizing to ensure that the route_type column is actually numeric.

write_feed_dangerously usage

partridge version: 0.10.0

I would like to have the option for creating a valid pruned GTFS feed based on a partridge feed.
I tried to use write_feed_dangerously, but:

I can't seem to figure out how to get back to a partridge feed from this. Can this be easily done somehow?
I have noticed it really isn't a valid GTFS feed. Is there any plan or work in progress for doing so?

Thanks

Using an HDF5 or some store for saving feeds and/or cache

partridge version: 0.4.0
Python version: 3.6
Operating System: Win10

Enhancement offer

I started working with partridge, and thought it could be helpful saving feeds and caches to a local store (HDF5) for easy access between runs of python. What's your thoughts?

"read_trip_ids_by_day" - custom service day hours

Feature request

Adding an option to get "trip id by day", considering a custom "service day" hours (not necessarily midnight to midnight).

Description

"Service day" definition may vary between applications.
For example, it is more convenient to define a service day of 4AM to 4AM of the next date, in order to analyze GTFS for working days.
To answer the question: "When is the earliest departure of route x at date y" it might be irrelevant that there is a departure of route x on 00:30 AM of date y.

Suggestion

An option for implementation is to filter trip_ids by something similar to the following method, copied from Paul Harrington's post on https://groups.google.com/forum/#!topic/transit-developers/ZkfnuNv1gho :
"When searching for next departures at a stop at a time between midnight and 3am I rolled back a day and added 24 hours to the hour so instead of running a query for departures at a stop starting from a date of 20180124 and a time of 01:00:00 I would use 20180123 and 25:00:00"

The usage could be something like adding "read_trip_ids_by_day" method, which is similar to "read_service_ids_by_date", but also gets "day_start" and "day_end" hours as inputs.
Then we would be able to filter the "feed" by these trip ids.

import datetime
import partridge as ptg

path = 'path/to/sfmta-2017-08-22.zip'

trip_ids_by_day = ptg.read_trip_ids_by_day(path, day_start="03:00:00", day_end="02:59:59")

trip_ids = trip_ids_by_day[datetime.date(2017, 9, 25)]
# Now "trip_ids" should contain all the trip_ids of trips that start 
#  between 25-09-2017 03:00:00 and 26-09-2017 02:59:59

feed = ptg.feed(path, view={
    'trips.txt': {
        'trip_id': trip_ids,
    },
})

Thanks!!

pip package incompatible with numpy 1.25

partridge version: 1.1.1
Python version: 3.10
Operating System: macOS Ventura 13.4

Description

I was trying to use partridge, first from peartree, then directly, in a mamba/conda environment with numpy 1.25.
I ran into the same issue documented by another peartree user here:
The code available on pip still uses np.unicode in gtfs.py, which was deprecated in numpy 1.20 and removed in numpy 1.24.

I was able to fix this for myself by replacing the pip version of partridge with the repository version because the repository version appears to have already fixed this bug.
So you could make many people's life a tiny bit easier by releasing the bug-fixed version on pip :)

What I Did

file = "my_awesome_gtfs_data.zip"
ptg.read_busiest_date(file)
Traceback: See https://github.com/kuanb/peartree/issues/178 (my trace was the same in all important respects)

Float representation of the parsed time is currently incorrect

Related (or maybe the same issue) is that GTFS specifies that times are seconds since noon minus 12 hours. For calendar days when there is a daylight savings shift, the float representation of the parsed time is currently incorrect.

Originally posted by @tilgovi in #42 (comment)

Busiest day of the service is not exhaustive

partridge version:
'1.1.1'
Python version:
'3.10.2'
Operating System:
'Windows 10'

Description

Describe what you were trying to get done.

I was trying to read the Portland Trimet Data (data here) using the partridge library and observe the busiest day of service(especially for buses)

Tell us what happened, what went wrong, and what you expected to happen.

The list of service_ids in the busiest day (output of read_busiest_date) was missing several service_id that were actually in service on that particular day.
I expected read_busiest_date to output all the service_id operational in that particular day but it only seems to include the service_id listed in the calendar_dates.txt. The GTFS documentation has the following description for calendar_dates.txt : "Exceptions for the services defined in the calendar.txt. If calendar.txt is omitted, then calendar_dates.txt is required and must contain all dates of service."
Since the Portland Trimet Data has both calendar.txt and calendar_dates.txt, I believe the read_busiest_date is considering only exceptions in the service_id ignoring the regular service_id.
Perhaps, the reason for this could be that the Portland GTFS reports the regular service_ids in calendar.txt with 0 for Mon- Sun. However, this might lead to incorrect results while using read_busiest_date as the function excludes these regular service_ids with 0s. An example of this can be seen below.

What I Did

Example:

import partridge as ptg
ptg.read_busiest_date('gtfs_Portland_2022_feb1.zip')

Output: (datetime.date(2022, 1, 24), frozenset({'A.613', 'D.613', 'Q.613', 'W.613'}))

The busiest day reported here is 24th January, 2022 which is a Monday.
The following service_ids are missing in the output : [B.613,C.613,F.613,E.613,U.613,S.613]
The missing service_ids include both Light Rail & Bus route_type. For example B.613 (Light Rail) consists of the route 'MAX Red Line' that operates Monday-Friday & Weekends. This is perhaps most busiest line as it connects the Portland Airport. Also U.613 (Bus) consists of 48 routes. All the routes in this service_id can be seen [here].(http://gtfs.transitq.com/TriMet_20220201_20220201/serviceids/U.613)

Output date format is incorrect

partridge version: v1.0.0
Python version: v3.7.1
Operating System: macOS

Description

Writing a feed results in date columns represented as '%Y-%m-%d' where they should be '%Y%m%d'. The reader code correctly parses dates in the feeds, but the writer does not write them in the same format.

detect_encoding very time consuming

partridge version: 0.10.0
Python version: 3.6
Operating System: Win10

I'm working on using partridge to speed up statistical summarization of a historical archive of feeds, so I made some kind of a mashup between partridge and GTFSTK, you can see an initial version here.

I profiled the run and noticed that detect_encoding is responsible for about half of partridge's run time. I am sure this could be alleviated somehow by specifying the encoding if it is known upfront.

cchardet doesn't work with python 3.10+

partridge version: *
Python version: 3.10.x
Operating System: *nix

Description

cChardet appears to be an abandoned project and is not compatible with python 3.10 (without compiling from source).

PyYoshi/cChardet#77

As a result installing partridge via pip fails

What I Did

#18 96.96   Running setup.py install for cchardet: finished with status 'error'
#18 96.97   error: subprocess-exited-with-error
#18 96.97
#18 96.97   × Running setup.py install for cchardet did not run successfully.
#18 96.97   │ exit code: 1
#18 96.97   ╰─> [24 lines of output]
#18 96.97       running install
#18 96.97       running build
#18 96.97       running build_py
#18 96.97       creating build
#18 96.97       creating build/lib.linux-x86_64-3.10
#18 96.97       creating build/lib.linux-x86_64-3.10/cchardet
#18 96.97       copying src/cchardet/__init__.py -> build/lib.linux-x86_64-3.10/cchardet
#18 96.97       copying src/cchardet/version.py -> build/lib.linux-x86_64-3.10/cchardet
#18 96.97       running build_ext
#18 96.97       building 'cchardet._cchardet' extension
#18 96.97       creating build/temp.linux-x86_64-3.10
#18 96.97       creating build/temp.linux-x86_64-3.10/src
#18 96.97       creating build/temp.linux-x86_64-3.10/src/cchardet
#18 96.97       creating build/temp.linux-x86_64-3.10/src/ext
#18 96.97       creating build/temp.linux-x86_64-3.10/src/ext/uchardet
#18 96.97       creating build/temp.linux-x86_64-3.10/src/ext/uchardet/src
#18 96.97       creating build/temp.linux-x86_64-3.10/src/ext/uchardet/src/LangModels
#18 96.97       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Isrc/ext/uchardet/src -I/usr/local/include/python3.10 -c src/cchardet/_cchardet.cpp -o build/temp.linux-x86_64-3.10/src/cchardet/_cchardet.o
#18 96.97       src/cchardet/_cchardet.cpp: In function ‘int __Pyx_modinit_type_init_code()’:
#18 96.97       src/cchardet/_cchardet.cpp:2466:53: error: ‘PyTypeObject’ {aka ‘struct _typeobject’} has no member named ‘tp_print’; did you mean ‘tp_dict’?
#18 96.97          __pyx_type_8cchardet_9_cchardet_UniversalDetector.tp_print = 0;
#18 96.97                                                            ^~~~~~~~
#18 96.97                                                            tp_dict
#18 96.97       error: command '/usr/bin/gcc' failed with exit code 1
#18 96.97       [end of output]
#18 96.97
#18 96.97   note: This error originates from a subprocess, and is likely not a problem with pip.
#18 96.97 error: legacy-install-failure
#18 96.97
#18 96.97 × Encountered error while trying to install package.
#18 96.97 ╰─> cchardet

It seems like switching back to chardet or optionally using chardet when on python 3.10+ would be potential alternatives.

Broken README rendering on pypi.python.org

Any ideas?

Non-numeric IDs

partridge version: 1.1.1 (via pip)
Python version: 3.9
Operating System: Windows

Description

As is, it appears that any non-numeric ID field will crash Partridge because Pandas is expecting IDs to be integers. Unfortunately the GTFS spec states that a valid ID "is a sequence of any UTF-8 characters", and in practice the use of non-numeric IDs is widespread.

What I Did

import partridge
partridge.read_busiest_date("gtfs.zip")
# For example - issue is not limited to this function

Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1050, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    partridge.read_busiest_date("gtfs.zip")
  File "...\partridge\readers.py", line 60, in read_busiest_date
    return _busiest_date(feed)
  File "...\partridge\readers.py", line 118, in _busiest_date
    service_ids_by_date = _service_ids_by_date(feed)
  File "...\partridge\readers.py", line 156, in _service_ids_by_date
    service_ids = set(feed.trips.service_id)
  File "...\partridge\gtfs.py", line 16, in getter
    return self.get(filename)
  File "...\partridge\gtfs.py", line 48, in get
    df = self._read(filename)
  File "...\partridge\gtfs.py", line 48, in get
    df = self._read(filename)
  File "...\partridge\gtfs.py", line 102, in _read_csv
    df = pd.read_csv(path, dtype=np.unicode, encoding=encoding, index_col=False)
  File "...\pandas\io\parsers.py", line 605, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "...\pandas\io\parsers.py", line 463, in _read
    return parser.read(nrows)
  File "...\pandas\io\parsers.py", line 1052, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "...\pandas\io\parsers.py", line 2056, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 850, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 982, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1056, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'offending_string_id'

Frequencies by route for a given time period

partridge version: 1.1.1
Python version: 3.8
Operating System: Windows 10

Description

Is there a recipe or notebook for obtaining service frequencies or average headways by route for a give time period (say 6am-9am)?

What I Did

I looked at the notebooks in the wiki and could not find what I was looking for.

Thank you!

Build functionality to pull GTFS from a URL

Downloading and navigating to a given GTFS feed for use in Partridge can be cumbersome and risky due to internet speeds and version control issues. A function to pull a fresh feed from a URL (e.g., from an agency's developer portal) would help with this.

See url2gtfs in https://davidabailey.com/articles/Visualizing-Public-Transportation-Speeds-with-Python for an example.

pypi release missing geopandas integration

partridge version: 1.0.0
Python version: anaconda3-5.3.1
Operating System: macOS 10.14.4

Description

Greetings from the American Planning Association! We're in San Francisco for our National Planning Conference. One component of this is a DataJam meant to get planners and civic tech people working together to try to solve some San Francisco transit problems.

I'm putting together some JupyterHub-hosted notebooks bootstrapped with data so that people can get working with data right away and not fritter away valuable time messing with getting environments and dependencies set up, ETL-ing and cleaning data, etc. partridge looks like an awesome resource for working with GTFS data efficiently without having to deploy a database. We'd love to get DataJam attendees using it as one possible method for producing awesome transit analyses and visualizations.

The geopandas integration looks like it hasn't made it to pypi yet. There's no full in setup.py extras; geo.py is missing and the geopandas dependency was not installed.

Installing using pip's VCS method from this repo works just fine, e.g. pip install -e git+https://github.com/remix/partridge/#egg=partridge, and I've used that to get it working in our hosted JupyterHub on Azure Notebooks so DataJam attendees can use it without problems during the event. Just thought you'd want the heads up!

What I Did

pip install "partridge[full]"
Collecting partridge[full]
  Using cached https://files.pythonhosted.org/packages/19/a6/3a26b6ffc3a317248a279f9c0057ae92e9f3432af50af8d233c217d80de3/partridge-1.0.0-py2.py3-none-any.whl
  partridge 1.0.0 does not provide the extra 'full'

[extraneous output ommitted]

Converter for route_id

partridge version: 0.11.0 (but also happens on 1.1.1)
Python version: 3.8
Operating System: Win 10

Description

I tried to change the types of the _id columns (i.e. route_id) in some table from dtype object to numeric, to lower the memory usage. I did that by adding a converter to the default config.
It went fine at, but the DataFrames came back empty. I looked into that a little bit and I think it is because the read_file method does the prune part before the type conversion, causing the comparison of object column (the column in the current table) with numeric column (from the dependency table that is type converted).
I'm not sure what would be the right solution for that, maybe changing both columns to object before comparison.

What I Did

In[1]: import partridge as ptg
In[2]: conf = ptg.config.default_config()
In[3]: ptg.load_feed(path, config=conf)
Out[3]:
  route_id agency_id route_short_name  ... route_desc route_type  route_color
0        1        25                1  ...  67001-1-#          3          NaN
1        2        25                1  ...  67001-2-#          3          NaN
2        3        25                2  ...  56002-1-#          3          NaN
[3 rows x 7 columns]

In[4]: import pandas as pd
In[5]: conf.nodes['trips.txt']['converters']['route_id'] = pd.to_numeric
In[6]: conf.nodes['routes.txt']['converters']['route_id'] = pd.to_numeric
In[7]: ptg.load_feed(path, config=conf).routes
Out[7]:
Empty DataFrame
Columns: [route_id, agency_id, route_short_name, route_long_name, route_desc, route_type, route_color]
Index: []

remix / partridge Goto Github PK

partridge's People

Contributors

Stargazers

Watchers

Forkers

partridge's Issues

Description

Description

Description

What I Did

Description

Description

What I Did

Description

What I Did

Description

Enhancement offer

Feature request

Description

Suggestion

Thanks!!

Description

What I Did

Description

What I Did

Description

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org