codeforpdx / opentransit-metrics Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 8.0 9.53 MB

Prototype of public transit data visualization system

Home Page: https://opentransit-pdx.herokuapp.com/

License: MIT License

Python 50.08% HTML 0.58% CSS 0.69% JavaScript 48.20% Shell 0.18% Dockerfile 0.25% Batchfile 0.02%

opentransit-metrics's People

Stargazers

Watchers

Forkers

youngj sidetrackedmind aks14075032 frederick-douglas-pearce kathyxiong shivaliksharma98 jdh33

opentransit-metrics's Issues

`trip_id` in arrival download is not real trip_id

Background

The trip_id in the download arrival data .csv file is the trip index for that particular route but it's not the schedule trip_id in the GTFS.

Questions to answer

Is the real trip_id stored in the arrival data?
If not:

why not?
can we add it easily?
If yes
can we add it to the download arrival data function?

screenshot of downloaded data (I'm pretty sure it's not normal to have trip_ids in sequence like that)

ufunc error round 2

Add unit tests to arrival time algorithms

(Original source: trynmaps/metrics-mvp#403)

Original Description:
Add tests to "data science" functions to allow newcomers to get better grasp of what functions are doing, shape of data input/output, and to allow easier changing/fixing of functions.

Functions to test are in:
backend/models/eclipses.py
backend/models/wait_times.py
backend/models/trip_times.py

Any questions about the functions or data should be directed to @youngj (Jesse Young on Slack).

For Our Purposes:
The above is pretty accurate for what would be useful to us. We can explore other places for unit tests if needed, but this is a good start.

create alert for when transit-collector breaks

Right now the transit-collector is running on @sidetrackedmind 's raspberry pi. It was silently broken for 1+ week. this is not great because we cannot get the data back. We need to come up with an alert mechanism in the interim and ultimately move this process off the raspberry pi and into the cloud where alerting / error catching is available to the whole team.

python save_routes.py --s3 --timetables --scheduled-stats will not run if local cache exists

Issue

If you try to run save_routes.py locally and you already have gtfs files in your cache the script will not push anything to S3.

Desired Fix

We need some way to force the function to send new data to S3 even if it exists locally

Can we run opentransit-metrics/kubernetes/compute-new-cronjob.yaml as github action?

Background

In order for the web app to show recent/current statistics, opentransit-metrics/backend/compute_new.py needs to run. This script pulls down the latest vehicle location data from S3 and runs compute_arrivals and compute_stats. The resultant data is put back in S3 for the web app to pick up and use to create data visualizations.

The compute_new.py used to run on Google Cloud (I think) using this yml:
https://github.com/codeforpdx/opentransit-metrics/blob/aba8f77506d93d91f68121569be58974ecdd130a/kubernetes/compute-new-cronjob.yaml

Issue / Ask

Is it possible to run this yml or run compute_new.py via github actions? From that doc, it looks like the first 2,000 minutes / month are free (or 3,000 if Code for PDX has a pro account):

If Github actions are not possible, what's the cheapest way to run that command daily? Can it run on the same heroku as the web app late at night?

Review / Edit Technical Documents - GraphQL queries, how to download data, etc.

(Original issue pulled from here: trynmaps/metrics-mvp#571)

Original Description:
We have a few guides for our GraphQL API, and I think it's technically possible for the public to pull some of our raw data. But it's not very easy for 3rd party developers to access. We should make a public site (either part of this app, or on a Wiki on this repo) that gives data/API access instructions to developers.

For Our Purposes:
Looking at the repo's primary README, this issue made me think we could edit it to do a few things:

Since this repo is specific to TriMet, we could reduce clutter for onboarding by removing any references to non-Portland specific public transit systems (or perhaps documenting these elsewhere).
Provide examples of the kinds of GraphQL queries made from opentransit-metrics
Anything else that makes sense along these lines.

backend to let users download arrivals data for 1 route over time period of choice

user parameters queries S3 and we make a zip csv for them.

All date range queries should show a warning if data for all days is not available

(Original source: trynmaps/metrics-mvp#540)

At the moment, the front-end will not reflect the availability of time range data in the back-end. For example, if there is data missing for a given day, the user can currently select that day to include in the front-end but the data will silently fail to be queried from the back-end.

There could be some technical issues here:

How do we know when no back-end data exists?
Doing preemptive queries could cause performance issues.

Assuming the above issues and any others are not concerns, it would be nice for the front-end to reflect the current state of back-end data.

Create banner on web app to show "latest available data"

Issue

Currently, it's impossible to tell the extent of historical data in S3 from the web app. A user may select dates outside the available range without knowing.

Goal

Create an indicator on the web app showing what data range is available. Something as simple as "Data available from 2022-03-01 to 2022-03-17" on the app somewhere. I think this could be by querying S3 and seeing the min/max available dates that have "observed-stats".

create a repo for gtfs static with action to look for changes. When there's a change, merge and run save_routes.py

I think it's possible to "watch" (i.e. pull down the latest folder from https://developer.trimet.org/schedule/gtfs.zip) and store it (unzipped?) in a repo. Then have a github action to check changes.

This is not a fully formulated thought but puttitng it here for future reference

Can we disable `S3 Deploy` workflow under Actions?

issue

This workflow has been failing with every commit - https://github.com/codeforpdx/opentransit-metrics/blob/master/.github/workflows/deploy.yml. Can we disable it? Maybe create another issue for fixing it?

Service Frequency Hover Over Info

Issue

The route summary tab already has hover over info

^ the above is showing the hover over info for "Median Travel Time".
I'm wondering if we should add hover info for some of the items on "Service Frequency" tab. I'm thinking specifically about "Distribution of Bunches/Gaps". I'm not sure what that means.

I think it might even be handy for other "Service Frequency" tab items (even if they seem obvious to some of us).

Update references to trynapi

The tryn-api repository was renamed to opentransit-state-api (https://github.com/codeforpdx/opentransit-state-api). We should update the module names, environment variables, and documentation referencing tryn-api in this repository.

update gtfs.py script

I saw the following error

/home/runner/work/opentransit-metrics-fork/opentransit-metrics-fork/backend/models/gtfs.py:756: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.

When running:
python save_routes.py --s3 --timetables --scheduled-stats --agency=trimet

create `first_complete_date` and add to metrics-state json object

There's a last_complete_date key in a json object in S3 - s3://opentransit-pdx/metrics-state/v1/metrics-state_v1_trimet.json.
We need to create a first_complete_date key and store in the same json. Then the banner would grab these dates and display them.

Storing and referencing historical GTFS feed

Current setup

The save_routes.py script has a pretty good description written into the comments (thanks whoever wrote this!).

The script downloads and parses the GTFS specification and saves the configuration for all routes to S3.
The S3 object contains data merged from GTFS and the Nextbus API (for agencies using Nextbus).
The frontend can then request this S3 URL directly without hitting the Python backend.

For each direction, the JSON object contains a coords array defining the shape of the route,
where the values are objects containing lat/lon properties:

"coords":[
 {"lat":37.80707,"lon":-122.41727}
 {"lat":37.80727,"lon":-122.41562},
 {"lat":37.80748,"lon":-122.41398},
 {"lat":37.80768,"lon":-122.41234},
 ...
]

For each direction, the JSON object also contains a stop_geometry object where the keys are stop IDs
and the values are objects with a distance property (cumulative distance in meters to that stop along the GTFS # shape),
and an after_index property (index into the coords array of the last coordinate before that stop).

"stop_geometry":{
   "5184":{"distance":8,"after_index":0},
   "3092":{"distance":279,"after_index":1},
   "3095":{"distance":573,"after_index":3},
   "4502":{"distance":1045,"after_index":8},
   ...
}

The terminal output of save_routes.py looks something like this:

route 98 MAX Shuttle
 default direction = 0
  shape_id: 503530 (220x) stops:5 from 8196 Gateway Transit Center to 13504 Portland International Airport - Arrivals 8196,10856,13206,13208,13504
  most common shape = 503530 (220 times)
  title = To Portland International Airport - Arrivals
  distance = 15649
 default direction = 1
  shape_id: 503532 (220x) stops:5 from 13504 Portland International Airport - Arrivals to 8196 Gateway Transit Center 13504,13207,13206,10856,8196
  most common shape = 503532 (220 times)
  title = To Gateway Transit Center
  distance = 15126

issue

Currently the script just overwrites the one S3 path, but this process could be extended in the future to
store different paths for different dates, to allow fetching historical data for route configurations.

Additional information

If you're not familiar with GTFS, it would be good to look at the reference material here - https://developers.google.com/transit/gtfs/ as a starting point.

Yesterday / Last Week / etc buttons don't work

On the front metrics page, there is a "Date-Time Range" date picker. When expanded on it gives you various options to change the start / end date manually, pick the days of the week, and a few presets such as "Yesterday" and "Last 30 Days".

For example, clicking "Yesterday" on March 11th, 2022 results in the following range:

(Note this is actually a bug in of itself, it should probably pick the 10th, but this is not the main issue reported here).

Next, change the start and end date to some other date and click Yesterday again. Only the start date is changed, not the end date:

Similar behavior can be observed for those other preset options.

I would expect the end date to be changed as well.

dropdowns in Isochrone aren't working ("All Day" and "Max Trip Time")

See screenshot below:

https://opentransit-pdx.herokuapp.com/isochrone

Time-distance chart "short trips"

Issue

If you look at the time-distance chart for a particular route, you will see short or broken trips (see screenshot below).

I suspect that the issue is not with the time-distance plot but probably with the underlying data. This may ultimately tie back to an issue of creating tests for the arrivals data.

compare scheduled trips vs. actual trips

Background

Trimet is having trouble staffing operators in order to meet the scheduled service. See this tweet - https://twitter.com/trimet/status/1524798763239235595?s=20&t=u9ryjiwXbl25n2cZYdGipQ.

Question

Is it possible to find out which routes have lost the most service? Does the static GTFS reflect the reduction in service or are they running a lot fewer buses/trains than scheduled in the GTFS?

Let user export our data as GIS/GeoJSON

(Original source: trynmaps/metrics-mvp#569)

Liam Question: is this a useful feature? Just want to make sure.

There could be a use case where our data would transfer nicely over to exporting to a GIS standard such as GeoJSON: https://doc.arcgis.com/en/arcgis-online/reference/geojson.htm

We'd have to get familiar with the GeoJSON schema and do the necessary transformations. At a quick glance, it looks like there are technically multiple schemas based on various geometrical shapes (lines, polygons, etc.).

analysis - create map summarizing isochrone results

Goal

Imagine placing the pin everywhere in Portland and calculating the coverage area of a certain travel time band. For instance, in the picture below:

Imagine calculating the total area of the blob up to the pink 40-45 minute gradient. Then do this for a bunch of points around the city and come up with a table like this:

place_id	lat	lon	coverage area (sq mi)
A	XX.XX	XX.XX	XXX.X
B	XX.XX	XX.XX	XXX.X

Then use the above table to visualize transit access "deserts" around the city.

Add slack notification to S3 check GitHub Action

Background

There's a GitHub action running every 8 hours to check on the opentransit-collector process. The opentransit-collector collects data from the TriMet API every 15 seconds and puts it in s3 (opentransit-pdx bucket). The collection is happening on a local raspberry pi so it doesn't have any cloud reporting or notification mechanisms. We wanted to create a process to get notifications when the opentransit-collector was failing.

The (temporary) solution

See - https://github.com/codeforpdx/opentransit-metrics/blob/master/.github/workflows/check_s3.yml. We created a GitHub action to look at the last modified time of objects in the opentransit-pdx bucket. If the last modified object time is greater than 5 minutes the GitHub action will fail. This is great but we would like to get notified.

Issue

We would like to add a slack notification to our opentransit-pdx channel when the collector is failing. This should be possible given existing GitHub action connection but someone will have to figure out the permissions and setup.

npm linting github action throwing error

The npm linter throws an error when it runs in the github action under:
https://github.com/codeforpdx/opentransit-metrics/blob/master/.github/workflows/test.yml
I disabled the workflow for now but it would be good to figure out the issue and fix it.

See this action as an example of the error:
https://github.com/codeforpdx/opentransit-metrics/runs/7928966223?check_suite_focus=true

It can be recreated locally:

docker run opentransit/react-dev:latest npm run lint:check

> [email protected] lint:check /app/frontend
> eslint src --ext .jsx,.js


/app/frontend/src/actions/index.js
  347:1   error    Delete `··`                               prettier/prettier
  348:1   error    Delete `··`                               prettier/prettier
  349:3   error    Delete `··`                               prettier/prettier
  351:1   error    Delete `··`                               prettier/prettier
  352:35  error    Insert `{`                                prettier/prettier
  353:7   error    Delete `{`                                prettier/prettier
  355:30  error    Insert `,`                                prettier/prettier
  356:1   error    Replace `······}·` with `····})`          prettier/prettier
  357:5   error    Delete `)`                                prettier/prettier
  360:7   error    Unexpected var, use let or const instead  no-var
  360:58  error    Insert `;`                                prettier/prettier
  361:7   error    Unexpected var, use let or const instead  no-var
  361:42  error    Insert `;`                                prettier/prettier
  362:19  error    Insert `;`                                prettier/prettier
  363:36  error    Insert `;`                                prettier/prettier
  364:16  error    Insert `;`                                prettier/prettier
  365:17  error    Insert `;`                                prettier/prettier
  366:61  error    Insert `;`                                prettier/prettier
  368:1   error    Replace `····⏎··};` with `}`              prettier/prettier
  375:5   warning  Unexpected console statement              no-console
  601:5   warning  Unexpected console statement              no-console

/app/frontend/src/components/RouteSummary.jsx
    8:1  error    `@material-ui/core/Button` import should occur before import of `./SummaryRow`  import/order
   50:1  error    Delete `⏎`                                                                      prettier/prettier
   55:5  warning  Unexpected console statement                                                    no-console
  217:7  error    Insert `··`                                                                     prettier/prettier
  218:7  error    Insert `··`                                                                     prettier/prettier
  219:1  error    Replace `······` with `········`                                                prettier/prettier
  220:1  error    Insert `··`                                                                     prettier/prettier
  222:1  error    Insert `··`                                                                     prettier/prettier

✖ 29 problems (26 errors, 3 warnings)
  26 errors and 0 warnings potentially fixable with the `--fix` option.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] lint:check: `eslint src --ext .jsx,.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] lint:check script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2022-08-26T03_50_00_624Z-debug.log

Is there a way to run opentransit-collector on the same heruko app as opentransit-metrics?

Background

Right now, the opentransit-collector is running on @sidetrackedmind's raspberry pi. Ideally, it would run in the cloud where it can be monitored and easily supported by the whole team.

Dream

Is it possible to launch opentransit-collector running in the background of the main heroku app? If this were a commercial app with lots of traffic, etc, it would be bad to have a separate process running in the background but this app is not (at least at this point) getting a ton of traffic. I'm wondering if we can save some $$ by adding the opentransit-collector to the (already running) heroku app?

@youngj - am I totally crazy thinking this could work? We could launch a separate heroku app for the opentransit-collector operation but it feels like overkill at this stage.

Fix Dependabot alerts

See https://github.com/codeforpdx/opentransit-metrics/security/dependabot

codeforpdx / opentransit-metrics Goto Github PK

opentransit-metrics's People

Stargazers

Watchers

Forkers

opentransit-metrics's Issues

Background

Questions to answer

Issue

Desired Fix

Background

Issue / Ask

Issue

Goal

issue

Issue

Current setup

issue

Additional information

Issue

Background

Question

Goal

Background

The (temporary) solution

Issue

Background

Dream

Recommend Projects

Recommend Topics

Recommend Org