codeforpdx / opentransit-metrics Goto Github PK
View Code? Open in Web Editor NEWPrototype of public transit data visualization system
Home Page: https://opentransit-pdx.herokuapp.com/
License: MIT License
Prototype of public transit data visualization system
Home Page: https://opentransit-pdx.herokuapp.com/
License: MIT License
The trip_id
in the download arrival data .csv file is the trip index for that particular route but it's not the schedule trip_id in the GTFS.
Is the real trip_id
stored in the arrival data?
If not:
screenshot of downloaded data (I'm pretty sure it's not normal to have trip_ids in sequence like that)
(Original source: trynmaps/metrics-mvp#403)
Original Description:
Add tests to "data science" functions to allow newcomers to get better grasp of what functions are doing, shape of data input/output, and to allow easier changing/fixing of functions.
Functions to test are in:
backend/models/eclipses.py
backend/models/wait_times.py
backend/models/trip_times.py
Any questions about the functions or data should be directed to @youngj (Jesse Young on Slack).
For Our Purposes:
The above is pretty accurate for what would be useful to us. We can explore other places for unit tests if needed, but this is a good start.
Right now the transit-collector is running on @sidetrackedmind 's raspberry pi. It was silently broken for 1+ week. this is not great because we cannot get the data back. We need to come up with an alert mechanism in the interim and ultimately move this process off the raspberry pi and into the cloud where alerting / error catching is available to the whole team.
If you try to run save_routes.py
locally and you already have gtfs files in your cache the script will not push anything to S3.
We need some way to force the function to send new data to S3 even if it exists locally
In order for the web app to show recent/current statistics, opentransit-metrics/backend/compute_new.py
needs to run. This script pulls down the latest vehicle location data from S3 and runs compute_arrivals
and compute_stats
. The resultant data is put back in S3 for the web app to pick up and use to create data visualizations.
The compute_new.py
used to run on Google Cloud (I think) using this yml:
https://github.com/codeforpdx/opentransit-metrics/blob/aba8f77506d93d91f68121569be58974ecdd130a/kubernetes/compute-new-cronjob.yaml
Is it possible to run this yml or run compute_new.py
via github actions? From that doc, it looks like the first 2,000 minutes / month are free (or 3,000 if Code for PDX has a pro account):
If Github actions are not possible, what's the cheapest way to run that command daily? Can it run on the same heroku as the web app late at night?
(Original issue pulled from here: trynmaps/metrics-mvp#571)
Original Description:
We have a few guides for our GraphQL API, and I think it's technically possible for the public to pull some of our raw data. But it's not very easy for 3rd party developers to access. We should make a public site (either part of this app, or on a Wiki on this repo) that gives data/API access instructions to developers.
For Our Purposes:
Looking at the repo's primary README, this issue made me think we could edit it to do a few things:
user parameters queries S3 and we make a zip csv for them.
(Original source: trynmaps/metrics-mvp#540)
At the moment, the front-end will not reflect the availability of time range data in the back-end. For example, if there is data missing for a given day, the user can currently select that day to include in the front-end but the data will silently fail to be queried from the back-end.
There could be some technical issues here:
Assuming the above issues and any others are not concerns, it would be nice for the front-end to reflect the current state of back-end data.
Currently, it's impossible to tell the extent of historical data in S3 from the web app. A user may select dates outside the available range without knowing.
Create an indicator on the web app showing what data range is available. Something as simple as "Data available from 2022-03-01 to 2022-03-17" on the app somewhere. I think this could be by querying S3 and seeing the min/max available dates that have "observed-stats".
I think it's possible to "watch" (i.e. pull down the latest folder from https://developer.trimet.org/schedule/gtfs.zip) and store it (unzipped?) in a repo. Then have a github action to check changes.
This is not a fully formulated thought but puttitng it here for future reference
This workflow has been failing with every commit - https://github.com/codeforpdx/opentransit-metrics/blob/master/.github/workflows/deploy.yml. Can we disable it? Maybe create another issue for fixing it?
The route summary tab already has hover over info
^ the above is showing the hover over info for "Median Travel Time".
I'm wondering if we should add hover info for some of the items on "Service Frequency" tab. I'm thinking specifically about "Distribution of Bunches/Gaps". I'm not sure what that means.
I think it might even be handy for other "Service Frequency" tab items (even if they seem obvious to some of us).
The tryn-api repository was renamed to opentransit-state-api (https://github.com/codeforpdx/opentransit-state-api). We should update the module names, environment variables, and documentation referencing tryn-api in this repository.
I saw the following error
/home/runner/work/opentransit-metrics-fork/opentransit-metrics-fork/backend/models/gtfs.py:756: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
When running:
python save_routes.py --s3 --timetables --scheduled-stats --agency=trimet
There's a last_complete_date key in a json object in S3 - s3://opentransit-pdx/metrics-state/v1/metrics-state_v1_trimet.json.
We need to create a first_complete_date key and store in the same json. Then the banner would grab these dates and display them.
The save_routes.py
script has a pretty good description written into the comments (thanks whoever wrote this!).
The script downloads and parses the GTFS specification and saves the configuration for all routes to S3.
The S3 object contains data merged from GTFS and the Nextbus API (for agencies using Nextbus).
The frontend can then request this S3 URL directly without hitting the Python backend.
For each direction, the JSON object contains a coords array defining the shape of the route,
where the values are objects containing lat/lon properties:
"coords":[
{"lat":37.80707,"lon":-122.41727}
{"lat":37.80727,"lon":-122.41562},
{"lat":37.80748,"lon":-122.41398},
{"lat":37.80768,"lon":-122.41234},
...
]
For each direction, the JSON object also contains a stop_geometry object where the keys are stop IDs
and the values are objects with a distance property (cumulative distance in meters to that stop along the GTFS # shape),
and an after_index property (index into the coords array of the last coordinate before that stop).
"stop_geometry":{
"5184":{"distance":8,"after_index":0},
"3092":{"distance":279,"after_index":1},
"3095":{"distance":573,"after_index":3},
"4502":{"distance":1045,"after_index":8},
...
}
The terminal output of save_routes.py
looks something like this:
route 98 MAX Shuttle
default direction = 0
shape_id: 503530 (220x) stops:5 from 8196 Gateway Transit Center to 13504 Portland International Airport - Arrivals 8196,10856,13206,13208,13504
most common shape = 503530 (220 times)
title = To Portland International Airport - Arrivals
distance = 15649
default direction = 1
shape_id: 503532 (220x) stops:5 from 13504 Portland International Airport - Arrivals to 8196 Gateway Transit Center 13504,13207,13206,10856,8196
most common shape = 503532 (220 times)
title = To Gateway Transit Center
distance = 15126
Currently the script just overwrites the one S3 path, but this process could be extended in the future to
store different paths for different dates, to allow fetching historical data for route configurations.
If you're not familiar with GTFS, it would be good to look at the reference material here - https://developers.google.com/transit/gtfs/ as a starting point.
On the front metrics page, there is a "Date-Time Range" date picker. When expanded on it gives you various options to change the start / end date manually, pick the days of the week, and a few presets such as "Yesterday" and "Last 30 Days".
For example, clicking "Yesterday" on March 11th, 2022 results in the following range:
(Note this is actually a bug in of itself, it should probably pick the 10th, but this is not the main issue reported here).
Next, change the start and end date to some other date and click Yesterday again. Only the start date is changed, not the end date:
Similar behavior can be observed for those other preset options.
I would expect the end date to be changed as well.
If you look at the time-distance chart for a particular route, you will see short or broken trips (see screenshot below).
I suspect that the issue is not with the time-distance plot but probably with the underlying data. This may ultimately tie back to an issue of creating tests for the arrivals data.
Trimet is having trouble staffing operators in order to meet the scheduled service. See this tweet - https://twitter.com/trimet/status/1524798763239235595?s=20&t=u9ryjiwXbl25n2cZYdGipQ.
Is it possible to find out which routes have lost the most service? Does the static GTFS reflect the reduction in service or are they running a lot fewer buses/trains than scheduled in the GTFS?
(Original source: trynmaps/metrics-mvp#569)
Liam Question: is this a useful feature? Just want to make sure.
There could be a use case where our data would transfer nicely over to exporting to a GIS standard such as GeoJSON: https://doc.arcgis.com/en/arcgis-online/reference/geojson.htm
We'd have to get familiar with the GeoJSON schema and do the necessary transformations. At a quick glance, it looks like there are technically multiple schemas based on various geometrical shapes (lines, polygons, etc.).
Imagine placing the pin everywhere in Portland and calculating the coverage area of a certain travel time band. For instance, in the picture below:
Imagine calculating the total area of the blob up to the pink 40-45 minute gradient. Then do this for a bunch of points around the city and come up with a table like this:
place_id | lat | lon | coverage area (sq mi) |
---|---|---|---|
A | XX.XX | XX.XX | XXX.X |
B | XX.XX | XX.XX | XXX.X |
Then use the above table to visualize transit access "deserts" around the city.
There's a GitHub action running every 8 hours to check on the opentransit-collector process. The opentransit-collector collects data from the TriMet API every 15 seconds and puts it in s3 (opentransit-pdx bucket). The collection is happening on a local raspberry pi so it doesn't have any cloud reporting or notification mechanisms. We wanted to create a process to get notifications when the opentransit-collector was failing.
See - https://github.com/codeforpdx/opentransit-metrics/blob/master/.github/workflows/check_s3.yml. We created a GitHub action to look at the last modified time of objects in the opentransit-pdx bucket. If the last modified object time is greater than 5 minutes the GitHub action will fail. This is great but we would like to get notified.
We would like to add a slack notification to our opentransit-pdx channel when the collector is failing. This should be possible given existing GitHub action connection but someone will have to figure out the permissions and setup.
The npm linter throws an error when it runs in the github action under:
https://github.com/codeforpdx/opentransit-metrics/blob/master/.github/workflows/test.yml
I disabled the workflow for now but it would be good to figure out the issue and fix it.
See this action as an example of the error:
https://github.com/codeforpdx/opentransit-metrics/runs/7928966223?check_suite_focus=true
It can be recreated locally:
docker run opentransit/react-dev:latest npm run lint:check
> [email protected] lint:check /app/frontend
> eslint src --ext .jsx,.js
/app/frontend/src/actions/index.js
347:1 error Delete `··` prettier/prettier
348:1 error Delete `··` prettier/prettier
349:3 error Delete `··` prettier/prettier
351:1 error Delete `··` prettier/prettier
352:35 error Insert `{` prettier/prettier
353:7 error Delete `{` prettier/prettier
355:30 error Insert `,` prettier/prettier
356:1 error Replace `······}·` with `····})` prettier/prettier
357:5 error Delete `)` prettier/prettier
360:7 error Unexpected var, use let or const instead no-var
360:58 error Insert `;` prettier/prettier
361:7 error Unexpected var, use let or const instead no-var
361:42 error Insert `;` prettier/prettier
362:19 error Insert `;` prettier/prettier
363:36 error Insert `;` prettier/prettier
364:16 error Insert `;` prettier/prettier
365:17 error Insert `;` prettier/prettier
366:61 error Insert `;` prettier/prettier
368:1 error Replace `····⏎··};` with `}` prettier/prettier
375:5 warning Unexpected console statement no-console
601:5 warning Unexpected console statement no-console
/app/frontend/src/components/RouteSummary.jsx
8:1 error `@material-ui/core/Button` import should occur before import of `./SummaryRow` import/order
50:1 error Delete `⏎` prettier/prettier
55:5 warning Unexpected console statement no-console
217:7 error Insert `··` prettier/prettier
218:7 error Insert `··` prettier/prettier
219:1 error Replace `······` with `········` prettier/prettier
220:1 error Insert `··` prettier/prettier
222:1 error Insert `··` prettier/prettier
✖ 29 problems (26 errors, 3 warnings)
26 errors and 0 warnings potentially fixable with the `--fix` option.
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] lint:check: `eslint src --ext .jsx,.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] lint:check script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2022-08-26T03_50_00_624Z-debug.log
Right now, the opentransit-collector is running on @sidetrackedmind's raspberry pi. Ideally, it would run in the cloud where it can be monitored and easily supported by the whole team.
Is it possible to launch opentransit-collector running in the background of the main heroku app? If this were a commercial app with lots of traffic, etc, it would be bad to have a separate process running in the background but this app is not (at least at this point) getting a ton of traffic. I'm wondering if we can save some $$ by adding the opentransit-collector to the (already running) heroku app?
@youngj - am I totally crazy thinking this could work? We could launch a separate heroku app for the opentransit-collector operation but it feels like overkill at this stage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.