Coder Social home page Coder Social logo

sausy-lab / retro-gtfs Goto Github PK

View Code? Open in Web Editor NEW
51.0 6.0 10.0 1.58 MB

Collect real-time transit data and process it into a retroactive GTFS 'schedule' which can be used for routing/analysis

Python 65.66% Lua 19.55% SQLPL 3.82% PLpgSQL 6.66% TSQL 4.31%
transit-data gtfs retroactive-gtfs gtfs-realtime gtfs-rt transit public-transport

retro-gtfs's Introduction

retro-gtfs

Overview

This application is designed to collect real-time transit data from the NextBus API and process it into a "retrospective" or "retroactive" GTFS package. Schedule-based GTFS data describes how transit is expected to operate. This produces GTFS that describes how it did operate. The output is not directly useful for routing actual people on a network, but can be used for a variety of analytical purposes such as comparing routing/accessibility outcomes on the schedule-based vs the retrospective GTFS datasets. Measures can be derived showing the differences between the schedule and the actual operations and these could be interpretted as a measure of performance either for the GTFS package (does it accurately describe reality?) or for the agency in question (do they adhere to their schedules?).

The program was designed to ingest live-realtime data and store it in a PostgreSQL database. The data can be processed either on the fly or after the fact, and with a bit of work you should also be able to massage an outside source of historical AVL data into a suitable format.

The final output of the code is a set of CSV .txt files which conform to the GTFS standard. Specifically, we use the calendar_dates.txt file to define a unique service pattern for each day, with its own trip_id's and stop times. No two trips are exactly alike, and so there are no repeating service patterns; each day is unique. The output also includes a shapes.txt file, but as there is a unique shape for each trip, the file can become very large and you may wish to ignore it.

Using the code

As for actually using the code, please have a look at the wiki, and feel free to email Nate or create an issue if you encounter any problems.

Related projects

Related projects by other people:

Academic Research

This project was developed around work conducted for my dissertation and several academic papers have resulted from that. If you use this code as part of a published work, please cite the following paper (journal,pre-print) outlining the basic methods:

@Article{Wessel2017,
  author    = {Wessel, Nate and Allen, Jeff and Farber, Steven},
  title     = {Constructing a Routable Retrospective Transit Timetable from a Real-time Vehicle Location Feed and GTFS},
  journal   = {Journal of Transport Geography},
  year      = {2017},
  volume    = {62},
  pages     = {92-97},
  url       = {http://sausy.ca/wp-content/uploads/2017/11/retro-GTFS-paper.pdf}
}

You may also be interested in:

  • On the Accuracy of Schedule-Based GTFS for Measuring Accessibility. pre-print
  • The Effect of Route-choice Strategy on Transit Travel Time Estimates. pre-print
  • Accessibility Beyond the Schedule. pre-print

See also, similar methods used by:

  • Advancing accessibility : public transport and urban space. PDF
  • Comparing Accessibility in Urban Slums Using Smart Card and Bus GPS Data. useless TRB link

retro-gtfs's People

Contributors

nate-wessel avatar xtremecurling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

retro-gtfs's Issues

Handle DST and named timezones

Daylight savings time is not being handled correctly yet.

Things that need to have correct DST handling added/verified:

  • service_ids set on trips
  • dates generated from service_ids in pull_data.sql for calendar.txt
  • stop times generated from service_ids in pull_data.sql
  • ...

Script eating up memory gradually, crashing

Had this problem after running the script for about 2.5 days. Python seems to have eaten all remaining memory on a 3gb machine, crashing. What caused this? Could it be slower DB inserts? Cumulative errors and threads failing to close? trash collection failure? Doesn't seem like it ran out of procs, but RAM. Also noticed ~3doz temp files that had not been deleted. Why would that be?

Reran script without truncating DB and had a crash less than a day later, several times.

Put monitoring in place and run again for several days, with constant output, checking for growth in any variables.

  1. num threads
  2. size of 'fleet'
  3. ???

Some terminal stops not being recorded

A quick look at parts of the #94 show that the ossington station stop is not being recorded very well. Surely the whole trip is being made, but the stop is not being recorded for many trips. Why is this? This is probably indicative of broader problems.

Suggestion: when trips are broken, a gap is formed between two vehicle reports. Perhaps just spanning the gap would help make the final connection.

Add service_id option to pull-data.sql

Long observation periods can produce datasets too large to hold in memory for OTP (for TTC at least). Need to be able to pull out just one or two days of data into GTFS format.

Default geom yeilding noisy errors if route incomplete

E.g. for jv_

trip_id to process --> 60382
	default route used for direction 102_0_var0
		stop off by 2484.67682975 meters for trip 60382
		stop off by 5248.22659567 meters for trip 60382
		stop off by 7429.09854012 meters for trip 60382
		stop off by 8619.05381025 meters for trip 60382
		stop off by 10352.7808203 meters for trip 60382

matching fails silently

I know that matching cannot be working as the OSRM server is not running with the right data. Yet I get no error. Fix this!

Store times in trips table, update DB

The vehicles table has been eliminated, and the store.py entry point works great. However the vehicles are not being stored yet, only the stop_times and trips. I need to store the vehicle times in an array alongside the trip and leave their locations as incorporated in the orig_geom linestring.

I will also need to update process.py's from_DB method to use these.

trip processing is very slow with more than ~50M vehicle records

I think I could speed up processing by doing some of the GIS stuff directly in Python rather than repeatedly making postgresql find the same records again and again. Making this change may be a rather large and cumbersome process, and I don't yet know exactly how it will be done.

Design a quality metric for trips and stop times tables

We need a metric for assessing how likely it is that a set of trips has been adequately mapped to a set of stops. Id est, to tell us whether more work needs to be done before the retro-GTFS package will be a decent representation of the actual transit service performed. Some ideas (completed):

  • Variance in number of stops made by trips on a route
  • Average number of repeated stops (generally shouldn't happen)
  • Percent of trips with no stops or very few stops
  • Ratio of trips using default versus OSRM geometries
  • Mean confidence rating for OSRM results
  • A list of problematic trip_id's
  • ...

Ignore:

  • Trips that are extremely short
  • Trips that don't go near their appointed stops

Moved stops can cause repeated stop_id's in output

Stop moves, e.g. a stop moved one block over with no id change, cause both geometries to be given in stops.txt with the same stop_id. I need to change the stop_id in stops and also in stop_times. output sql script will need to be updated as well.

Syntax Error

In file create-agency-tables.sql, there's an extra comma on line 52.

Stdout doesn't include timestamps

Running store.py with nohup continuously and the current contents of nohup.out the log file for stdout is

1068 in fleet, 4 ending trips
1066 in fleet, 6 ending trips
1066 in fleet, 5 ending trips
1065 in fleet, 4 ending trips
1065 in fleet, 11 ending trips

And so on, which isn't particularly useful given the lack of timestamps. Thoughts on adding timestamps or other info to this output?

Very high speeds being reported on some routes

They look like they have reasonable data...
jv routes

High speed travel detected in trip 68766: 3rd Ave. S. & 3rd St. to Rosa Parks Station Bay K. 26282 meters in 219 seconds. (432 km/h).

High speed travel detected in trip 71330: N. 3rd St. & N. 9th Ave. to Rosa Parks Station Bay B. 25829 meters in 7 seconds. (13283 km/h).

High speed travel detected in trip 59956: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

High speed travel detected in trip 53250: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

High speed travel detected in trip 66628: McCormick Rd. & Ft. Caroline Rd. to Rosa Parks Station Bay L. 13865 meters in 7 seconds. (7131 km/h).

and 2559 more of this type.

Project vehicles forward/backward to termini

Check whether the trip covers a substantial portion of the route and if it does, assume that it goes all the way to the ends and therefore makes all the stops.

Project outwards to the termini at some speed based on observation of surrounding points.

Expand map matching search radius for very poor matches

When a very poor match is returned, try rematching with a wider error radius. This is harder computationally but may return better results for some trips.

This would particularly help a couple routes in San Francisco. It's not known if this will produce worse/weird results in other cities.

missing threads issue

It seems that a few threads are hanging unfinished during the processing phase, preventing other threads from being initiated. If too many trips are queued, the process will not finish and will keep sleeping forevs.

Handle changing schedule data; match trips to appropriate schedule

Currently there is no way of handling changes to the schedule. We keep randomly checking the schedule data from the NextBus API to see if anything has changed, and we store any changes. But nothing is done to link the most recent data to an ending trip. This has worked because nothing has changed during script execution yet.

When the set of stops is selected, it needs to be the most recent set of stops only.

add block_id to consecutive trips

This will allow the traveler to stay on the vehicle... most importantly this means that if the trip 'starts' just outside the station, it will be linked to the station by the previous trip in the block.

Break blocks only when vehicles go off the radar. Start trips with new headsigns or route_ids, but maintain block with the vehicle.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.