Research Question: how far into the future do RT updates matter?

This came from Greg Newmark, who's working with Cal-ITP to analyze RT schedule adherence. He's seeing how well actual arrival times match scheduled arrival times, and how accurate those arrival estimates are for various periods of time leading up to the actual arrival. For example, how good are we at estimating arrival times when the vehicle is 5, 10, or 20 minutes away. Specifically, he's asked us how far in advance we might care to study. This is a good question for our GTFS Schedule dataset.

Answering this question means exploring both how long a rider has to wait between vehicles, as well as how long that vehicle has to travel between stops. If the route is hourly, but each stop is five minutes apart, then we're only looking at a max of five minutes between estimation and arrival. Conversely, on a route that arrives every five minutes but the stops are an hour apart, a rider would only care about the estimation for five minutes until the next bus arrives.

To answer this question, I propose we produce a plot of stop_times with the following axes:

On the X axis, the number of minutes since the previous trip of that same route arrived at that stop. This means we exclude the first trip for each route of the day. So for example, for an hourly route, it's 60 minutes between trips, even if another route also stops at that stop location.
On the Y axis, the number of minutes since the previous stop in that stop_times' trip's stop sequence. So for example, if the bus makes a stop at 1:05, 1:10, and 1:30, it would have entries for 5 and 20 minutes.

Note that there are a ton of stop_times in any one dataset, let along in the state database, so this needs to be filtered.

Rather than running this for every valid calendar date in every feed, run for only the present day.
Exclude the first trip of each route in that day.
Exclude any data point (X/Y pair) that represents fewer than 0.5% of all stop_times.

Research: GTFS-RT Presentation Demo for CARB

Question

Meeting scheduled for Feb 24. Can build off of district presentations and show a variety of existing data statewide.

Metrics

Presentation drafted
Presentation reviewed and complete

Data sources

GTFS-RT vehicle positions (existing raw table in warehouse)
GTFS Schedule data (in warehouse)

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Use intake for importing data in Jupyter Notebooks

Is your feature request related to a problem? Please describe.

As analytics requests increase, there's a need to store outside data sources, as well as significant overlap in datasets. Along with establishing GCS buckets (#526), we should be able to catalog our canonical data sources somewhere and everyone imports the same file for their analyses.

Describe the solution you'd like

Use intake to catalog our various data sources, determine canonical datasets for outside sources like Census, CA open data portal, etc.

Describe alternatives you've considered

Additional context
See data catalogs in City of LA repos planning-entitlements.

Research: Bus Service Opportunities by Census Tracts

Question

Level of bus service for census tracts by pop density/jobs and CalEnvironScreen.

For Chad Edison, CalSTA

Metrics

the # bus stop-visits per sq mi (each bus stop multiplied by # times bus visits the stop per day, normalized to per sq mi for census tract)
census tracts categorized as low to high job or population density vs census tracts categorized into CalEnvironScreen bins
break apart into weekday / weekend and peak / off-peak

Data sources

GTFS schedule: gtfs_schedule_dim_stop_times, gtfs_schedule_fact_daily_trips (let's filter to Wed or Thurs for weekday), gtfs_schedule_dim_stops for lat/lon
CalEnvironScreen 4.0 for census tract geometry, population, equity metrics
jobs by census tracts from LEHD origin-destination. Possibly use 1 or 2 from Urban Institute -- RAC All Jobs Excluding Federal Jobs and WAC All Jobs Excluding Federal Jobs?
Park-and-ride locations for intercity connectivity

Deliverable

Interactive map of CA census tracts: do we need bivariate legend
Add park-and-ride locations as points layery

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: Level of Delay, GTFS-RT

Question

Given an arbritary area and time (ie, a particular section of the state highway network, or the city of San Mateo), compute how many minutes of delay buses and rail have

Metrics

Will need to impute the time of entry, time of exit from schedule data and guess "how many minutes should this take" vs on average, how many minutes does this take

Data sources

GTFS-RT Parquet Timing Rectangles
GTFS Schedule Data

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: routes and stops shapefiles for public-facing portal

Question

Create 2 datasets to upload to ArcGIS for Traffic Ops, eventually move this to Airflow to be scheduled to overwrite AGOL dataset at some frequency:

every stop + what route at those stops
every route, with a line representing the route either from shapes.txt or creating one from stops.txt

Metrics

Make sure each row represents what is needed, get rid of "duplicates".

Don't use AGOL hosted feature service and use credits. Use some public-facing geoportal?

Data sources

Use gtfs_schedule tables: stops, stop_times, trips, routes, agencies

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Analytics Style Guide

Is your feature request related to a problem? Please describe.

Implement an analytics / visualization style guide based on Cal-ITP's branded resources for slides / docs.

Describe the solution you'd like
Build a package part of shared_utils that will handle most of the styling for visualizations made within Jupyter Notebooks. Make it available for all the major charting packages: altair, matplotlib, seaborn, plotnine.

Describe alternatives you've considered

Additional context
Add any other context or screenshots about the feature request here.

Research Question: how often do GTFS-RT feeds get updated?

The context is a query that came through the GTFS Helpdesk. A vendor provides 30 second updates, while the Guidelines specify 20 seconds or less. I want to help each transit provider understand where they fit in among other providers in California. What's "normal" or typical for an update frequency. What percentile would they be in with 20, 30, or 60 second updates? What's the shortest update frequency, and how common is it?

Research: Consolidated Application

This issue looks at the the organizations who applied for 5311/5311(f)/CMAQ, 5339(a), or LCTOP using the new Consolidated Application process. Applicants just need to complete one application for the funds above, once a year.

Data Questions:

Which organizations applied?
Which funds did they apply for?
What are people interested in?

Metrics:

TBD

Data sources:

Black Cat Consolidated App Data
More info on Consolidated Application.

Tasks and Goals:

Create a Tableau dashboard.

MST Payments Adoption slide deck

Question

Identify key data points in MST payments that detail current adoption of contactless payments.

Metrics

Contactless versus traditional fare
Was fare capping implemented?
Of contactless, how many paid less than nominal fare?

Data sources

MST trip data

Research: How many operators is Cal-ITP assessing?

User Story

As a Cal-ITP program manager or more senior Caltrans executive,
I want to know how many transit agencies are being assessed in California
so that I can have a baseline for calculating other metrics like the one described in cal-itp/data-infra#984.

Additional Context

The gist of this question lays the foundation for answering a variety of questions that high-level executives asks such as "what percent of transit agencies have GTFS Schedule data?" or "what percent of transit agencies have Fares v2?" or "what percent of transit agencies are GTFS-compliant?"

Research should be performed with various stakeholders to determine how to define and filter the data we have in airtable about organizations, services and potentially other items. Part of this task should include a document detailing how to filter the data in airtable in order to provide this baseline for measurement. If none of the stakeholders can give a clear answer about how to calculate this baseline, a deliverable of this report should propse at least one recommended option for calculating this baseline.

Acceptance Criteria

Given the data Cal-ITP has collected about transit agencies with respect to how they are funded, what kind of service they operate, and any other relevant critieria
When applying all relevant criteria about what qualifies as a transit agency for reporting purposes
Then a number should be calculated.

The deliverable of this should include:

A memo containing the precise, quantifiable and measurable definition of what qualifies as a transit agency for answering the above-mentioned high-level questions
A metabase question that simply shows the resulting number of transit agencies when applying the criteria to the data in airtable

Sprint Ready Checklist

Acceptance criteria defined
Team understands acceptance criteria
Team has defined solution / steps to satisfy acceptance criteria
Acceptance criteria is verifiable / testable
Dependencies identified

Appendix

The document Cal-ITP Transit Provider Categorization + Activities is a detailed document about the various ways that transit agencies could be categorized, but it does not include a recommendation for how to establish a baseline for reporting.

There already exists a filter within airtable that seems to do something with regarding filter assessed operators. Research should be done to determine if this is relevant. Screenshots of this filter is shown below:

Overall airtable filter

Reporting Category

Currently Operating

Service Type

Additional service type filter

Research: GTFS-RT Speedmaps and Presentation Ready for D11

Question

Meeting scheduled for March 1. Before then, need to have speedmaps and metrics generated for District 11 transit operators, as well as a polished presentation.

Metrics

Speedmaps and other metrics ran for each D11 operator with available RT data
Maps and metrics highlight trolley/fixed rail vs. bus to extend available per D11 director ask
Presentation drafted
Presentation reviewed and complete

With respect to avg speed for buses, how does it compare with avg speed on trolley or fixed rail services? When I was involved with deploying the first two BRT services in SD, I was surprised at how low the avg speed is on trolley due to lack of grade separations and number of stops. The avg bus speeds seemed low but in fact were higher than avg speeds of trolley or coaster. This data highlights the importance of transit priority projects: managed lanes conversions, signal priority, bus on shoulders, etc. and will help support purpose and need for these kinds of projects.

Data sources

GTFS-RT vehicle positions (existing raw table in warehouse)
GTFS Schedule data (in warehouse)

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: Funding Application Frequencies

Question

Currently, agencies are required to apply for funding various times throughout the year. Due to this existing structure, there is a large administrative burden put on agencies, to get all these applications in and reviewed. This research will identify the frequency of the application cycles across divisions to answer the following:

How often do agencies apply for funding across the transportation-related grant programs?

Metrics

For each agency, how many times do they apply?
By month
By Quarter
By year
Average days between applications by agency
Where are the grant applications concentrated between the grant groupings?
For each quarter in a FY, how many applications fall into the different grant groupings?

Data sources

Deliverables:

Google Slides

Acronyms/IDs for agencies in DLA data

Question

How can we connect the agencies in this dataset to different databases in the Cal-ITP warehouse
What unique identifiers exist, and which need to be created?

Metrics

Research existing datasets in the Cal-ITP warehouse and in DLA warehouse
If needed, create a new unique agency identifier that can be linked between the datasets

Data sources

Docs: add tutorials and centralized knowledge for analyst reference

Is your feature request related to a problem? Please describe.

The analytics tools section now serves 2 distinct audiences:

new analysts - in need of self-serving materials to go from zero to hero
current analysts - in need of references

Describe the solution you'd like

Work in tutorials into a different "chapter" of analytics tools docs for new analysts. -- Amanda (cal-itp/data-infra#1351)
Add some reference section for analysts to find reference materials for common packages used, interesting code tidbits to lift (sorting order on charts to not be alphabetical, getting labels to display $ and millions / thousands, adjust for inflation, etc). Live in docs section, but need to flesh out section on contributing to docs so analysts can continue to add to this body of knowledge -- all analysts contribute Tiffany, Natalie, compiled resources
Add in new section that reflects new things in analyst workflow (csvkit and writing from GCS to Big Query via command line) -- Charlie (cal-itp/calitp-py#54)

Describe alternatives you've considered

Used City of LA best-practices repo, which is now private and can't be accessed.

Additional context
Add any other context or screenshots about the feature request here.

Research: Transit (Bus) Service Increase

Question

How many service hours and trips/runs need to be added by operator-agency-service type to reach desired levels of service?
Which census tracts, grouped by CalEnvironScreen categories, have no service?

For Caltrans and CalSTA exec board. For Gillian.

Deliverables:
Drive > Team Workspaces > data services

methodology (slides)
Google Slides

Metrics

Service hrs and trips/runs

LOS for urban (once every 15 min) / suburban (30 min) / rural (60 min) -- use census tract criteria to categorize
If route runs through urban/suburban/rural, use a simple cut-off for categorizing
Aggregate to the operator-route-type of service, possibly to county later on
Pick a couple of representative dates for weekday, Sat, Sun service

Desired output:

Operator Type	Weekday	Sat	Sun
Urban	list of x agencies and `service hrs` and `trips/runs`
Suburban	list of y agencies ...
Rural	list of z agencies...

CalEnviroScreen

Subset by Pollution Burden and Population Characteristic scores??

Data sources

GTFS schedule: gtfs_schedule_dim_stop_times, gtfs_schedule_fact_daily_trips (let's filter to Wed or Thurs for weekday), gtfs_schedule_dim_stops for lat/lon
Census tract: check CA open data portal, need geometry and population, use pop density to do urban/suburban/rural cut-off
CalEnvironScreen: 3.0 available on open data portal and also has population OR go with 4.0

Methodology

Find geographic extent of route
Classify route as urban/rural/suburban based on census tract category containing the most route length
Pick some stops along each route: stop_sequence (min, max, midpoint)
Filter by time range in the day, skip overnight hours of service
Observe for that stop, how many trips it makes per hour (once every ___ min).
Calculate the average service hours it takes per trip, and scale up to see how many more trips (and therefore hours) operator would need to add along that route to bring to desired frequency.
Clarify: along a corridor, do we want all the routes to run at 15 min intervals overall, or each individual route to run at 15 min intervals? (@hunterowens)

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: How many validators for Thruway Buses`

Question

Similar to to the LOSSAN question @edasmalchi answer, Gillian would like to know "how many validators would be needed" to outfit the can he find the thruway buses that connect to CCJPA, LOSSAN and SJRRA?

Metrics

How many validators would be needed for the Thruway bus services

Data sources

NTD
GTFS schedule

Some nuance here is needed, since Amtrak isn't the operator / owner of many of its buses. Gillian is looking up ways to get a diff estimate of # of buses in the fleet, but we can guesstimate for now

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

MSD Dashboard Metric: Number of feeds with data about physical accessibility

Question

How many feeds have data about physical accessibility?

This question can be answered by resolving cal-itp/data-infra#561 and cal-itp/data-infra#562, therefore it is blocked for now.

Metrics

The exact criteria deciding whether a feed "has data about physical accessibility" could be determined in various ways. See proposed idea for an MVP or a more rigorous analysis.

MVP:

Presence of stops#wheelchair_boarding field
Presence of trips#wheelchair_accessible field
Presence of non-empty pathways table when at least one child_stop within parent_stop exists in stops

More rigorous analysis:

Require minimum percent of:
- Rows in stops table with wheelchair_boarding field set to "not unknown" value
- Rows in trips table with wheelchair_accessible field set to "not unknown" value
- Child_stops within a parent_stop that can reach every other child_stop within parent_stop when simulating travel as an able-bodied person

Data sources

Static GTFS
- stops#wheelchair_boarding
- trips#wheelchair_accessible
- Pathways data as it relates to child and parent stops

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: downloading PEMS data

Question

Get PEMS data, see which dataset should be downloaded, maybe pushed into BigQuery? PEMS data may help us answer how fast are cars traveling along the streets bus routes are also traveling along. This helps us better understand traffic speeds by various times of day to help calculate car travel times when traveling along parallel-to-SHN bus route.

Metrics

Is data available only along detectors on the freeway / SHN?
Is data available for local streets and roads?
How often should we download? What tools to download gzipped files, unzip, basic data cleaning, then push to BQ?

Data sources

PEMS

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

User Story: parameterized reports with papermill

User stories

DLA notebooks for districts is a good prototype for papermill and parameterizing reports.

A prototype notebook and this script here is a good place to start.

Summary

Figure out if GitHub pages or something similar is where that notebook converted to html lives.
See what limited interactivity / user interactivity the HTML page can handle - tooltips, hovering, selecting lines, highlighting, etc
Long-term: consider accessibility needs, including browser vs mobile devices, do we get rid of interactive elements, etc

Metric: number of feeds with no critical validation

Question

Number of feeds with no critical validation errors (schedule or realtime)?

Metrics

TODO

Data sources

transit database (TODO: link)
a spreadsheet or something else, describing what errors are critical (TODO: michael to link)
- Note: there's a list on this website

MSD Dashboard Metric: Number of feeds with Fares v2 data

Question

How many feeds that we track have Fares v2 data?

Metrics

The total number of feeds that have a non-empty fare_leg_rules.txt file.

Data sources

A number of new files are being proposed to be added to the static GTFS specification together called "Fares v2". See reference doc. The MVP for checking this however is to simply assert that the fare_leg_rules.txt file is present and it contains at least one potentially empty row.

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

MSD Dashboard Metric: Number of feeds with GTFS-Realtime

Question

How many assessed operators have a complete set of associated GTFS-Realtime feeds?

Metrics

An assessed operator is considered to have a complete set of associated GTFS-Realtime feeds if the existence of all three types of GTFS-Realtime feeds (Trip Updates, Vehicle Positions, Service Alerts) can be confirmed.

Data sources

Airtable via data warehouse

Depends on

cal-itp/data-infra#979

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

New Team Member - [test]

Name:
Role:
Reports to:

Google Workspace Email Address:
GitHub Username:
Slack Username:

Set-up:

User Story: shared utility functions for analysis

User stories

A user story is implemented as well as it is communicated.
If the context and the goals are made clear, it will be easier for everyone to implement it, test it, refer to it.

Summary

A user story should typically have a summary structured this way:

For analysis, there's probably a set of steps in data cleaning or data visualization that we do repeatedly. We encounter them both within a research question and across research questions. Why reinvent the wheel?

I would want to document some of these shared utility functions to standardize and make it easier for analysts to do these steps. The utility functions would be importable across all directories in the data-analyses repo and can be called within a Jupyter notebook or Python script.

See 2 examples from City of LA work: covid19-indicators and planning-entitlements

Ex 1: exporting a geodataframe to geoparquet and save to GCS bucket. cal-itp/data-infra#698. Solution was to create a function that would write a geoparquet locally, upload to GCS bucket, then erase the local file. This is a repeated step that many analysts would come across using the JupyterHub + GCS, and the typical way to export doesn't work.

Ex 2: aggregating by geography (census tract, Caltrans district, zip code, etc). There's a pandas function that helps us aggregate and take the sum, count, count unique values, etc. Currently, to do a mix of these, you could merge your dataframes back together using df.groupby().agg() or get wonky column names using df.pivot_table(). A common function would wrap all this aggregation up and be paired with attaching geometry back, as the geometry column throws errors when you aggregate.

Other examples would be common charts or maps. We'll add more as we come across more use cases of generalizable functions!

Acceptance Criteria

We can start with Tiffany / Eric's transit service research, and expand as more analyses are done by others.

Also, here are a few points that need to be addressed:

Need to test that shared_utils is importable across all directories, since analysts are making their own folders within data-analyses to store their work.
Would like the python files to be editable. This was in a previous docker-compose.yml: pip install -e...where should this go?

Notes

Initial work here: https://github.com/cal-itp/data-analyses/tree/shared-utils

Tester [Stakeholder]

@tiffanychu90

Sprint Ready Checklist

- Acceptance criteria defined
- Team understands acceptance criteria
- Team has defined solution / steps to satisfy acceptance criteria
- Acceptance criteria is verifiable / testable
- External / 3rd Party dependencies identified

Research: Contactless Payments Demonstration Review

Contactless Payments Demonstration Review

Over the past 6 months, the Cal-ITP payments team has been conducting a contactless payments demonstration with select agency partners across the State of California. Now that data and experience have been gathered, we are ready to begin synthesizing quantitative and qualitative learnings and communicating them out to stakeholders across the State.

Metrics

The document Potential Questions for Payments Data generated by the payments team has been used to scope out questions to be answered during the process of the demonstration, and the team has been iterating the document against their experiences. The areas covered by this document should be used to inform this review and include, but are not limited to:

Rate and speed of conversion to contactless payments
Revenue generated by contactless payments, relative to traditional payments
Behavior compared between full fare and discount fare customers
Influence on new transit ridership
Potential areas of focus: marketing effects, distribution of fare-capping, mobile-preference, rewards, eligibility, cash deposits, safety

Data sources

Payments Dashboard - Metabase
views in the warehouse
- payments_rides
payments datset
- stg_cleaned_customers
- stg_cleaned_micropayment_adjustments
- stg_enriched_micropayments
Payments 101 Content - Lilly & Jenny
Anecdotal experience - Lilly, Jenny, Mjumbe, Ben

(Data Servicess Team to Copy and Fill Out Below)

This will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: (Refine as needed) How has the introduction of contactless payments through the Contacless Payments Demonstration impacted transit agencies and their ridership from both a quantitative and anecdotal perspective?
View:
- Payments Dashboard - Metabase
- views related to payments in the warehouse
- payments datset
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: Cost to provide GTFS-RT for small/rural operators

Question

About what would it cost to provide GTFS-RT for small/rural operators in CA? (to help Gillian with the budget process)

Metrics

number of (small/rural) operators
fleet sizes, both for current service and an estimate based on transit service increase work
total cost estimates based on GRaaS costs per vehicle and per operator
~~percentage of routes likely to not have cell signal~~ (not in this analysis per Gillian's request)

Data sources

Data Warehouse/GTFS
Transit service increase estimates
Transit stacks/NTD
GRaaS costs from GRaaS team

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research Question: Map of cell downzones by GTFS-RT

Ask Lauren for an agency to start with, or use something in Humboldt County as a default.

NTD Process Mapping and Reporting Modernization

Question

Identify existing relationship between existing NTD data products, and how Caltrans can improve reporting processes.

Metrics

Number of issues coming back from NTD
Data sources of these issues - can GTFS or other sources improve this?
Are there other examples of NTD modernization?

Data sources

NTD data products
Caltrans data reports and communications with NTD
Internal process documents to identify reporting procedures

MSD Dashboard Metric: general population public transit gtfs coverage

Question

What % of California (and Californians) has (open to the public) transit coverage in GTFS?

Metrics

By area:

The % of non-water area of Californian that is within 1/4 mi of a bus stop or 1 mi of a ferry/rail stop that has is served by a public-funded, open to the general public transit service with GTFS Schedule data

By Population:

The % of Californians that are within 1/4 mi of a bus stop or 1 mi of a ferry/rail stop that stop that has is served by a public-funded, open to the general public transit service with GTFS Schedule data

By Employment (optional):

The % of Jobs that are within 1/4 mi of a bus stop or 1 mi of a ferry/rail stop that stop that has is served by a public-funded, open to the general public transit service with GTFS Schedule data

Data sources

Census data
Airtable database listing transit services, eligibilities, and GTFS data
stops.txt for GTFS datasets and shapes.txt for continuous stops and locations.geojson for flexible services.

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Open loop payment demos

Question

What can we learn from the four demos re: open loop payments?

Metrics

Differences in contactless versus nominal fare

Data sources

Payments data

Research: Update routes / stops shapefile scripts

Question

With the new views.gtfs_schedule_dim_shapes_geo table now available, update the traffic_ops scripts related to creating routes and stops data.

previous GH issue

Metrics

Data sources

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Geographies of Interest: Federal Funding

Question

Where are projects located/what types of funding are concentrated
Are the projects concentrated
Where are funds not going

Metrics

geoparsing, where data allows to pinpoint project locations locations
Using various geographies (District, city, county, MPO)
clustering analysis

Data sources

Research: Parallel transit corridors to State Highway Network

Question

Which transit routes are considered parallel to the State Highway Network (SHN)?

Metrics

Find transit routes within 1 mile of SHN
Define parallel vs intersecting route (based on % of the transit route overlaps with 1 mile from SHN and % of the highway that overlap is)
What improvements in service do these need to be competitive with cars
Use Google Directions API to constrain car to travel along bus route, see which routes are viable competitive routes that can be targeted for improved service

Data sources

State Highway Network
GTFS schedule data to assemble routes (use work from #160)

Outputs / Presentations

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Initial GTFS-RT self-serve speed/delay tools via Jupyter/Papermill

Is your feature request related to a problem? Please describe.
Caltrans staff and other stakeholders need self-serve access to our GTFS-RT speed and delay data in the coming weeks to enable work on the pending innovation challenge. This will likely occur before the RT pipeline is fully built out.

Describe the solution you'd like
An interactive webpage based on existing GTFS-RT speed/delay work, with functionality such as:

ability to select a Caltrans district and view maps+charts for operators in that district
interactive maps and charts allowing filtering by time of day, route, etc within a single, pre-processed day of analysis
a basic geospatial data export capability

Describe alternatives you've considered

Additional context
Eric to meet with @atvaccaro tomorrow about suitability of jupyter/papermill as a tool for this

Research: 5311 Agencies

Data Questions:

How many 5311 agencies are there in California? By district? By county?
What is the average fleet size and number of doors?
How old is the fleet? When are their 10 or 12 year cycles up for their existing fleet? With this information, we can estimate which agencies will potentially apply for 5339 or/and TIRCP.
How many 5311 agencies have a GTFS Status?
How many 5311 agencies have an existing CAD/AVL Vendor? If so, when are the contract dates up?

Metrics:

Grouping vehicle types and age into bins.
Calculating how many agencies overlap in the data sets we use.

Data sources:

5311 Black Cat
Cal ITP GTFS Status
NTD Data on Fleet Age

Tasks and Goals:

Create a crosswalk for the Rural Reporters in California containing the NTD and Cal-ITP IDs
Determine aggregation method for the data
Create functions for cleaning, analysis, and visualizations

Research: TIRCP grants

Questions

What is the health of the TIRCP program, looking at the cycle in which award recipients received their money & their expenditure percentage?
How can Caltrans streamline the presenting and reporting of this data?
What other data sources can be merged into look at other funding sources the agencies have received & track environmental goals?
Automate two reports (Semi Annual and Program Allocation Plan) that are created manually using Python, incorporate ADA standards as well.

Metrics

Signaling progress of award recipients through categories: behind, ahead, no expenditure recorded, on track.
Looking at percentage of expended, allocated, and TIRCP amounts through tables and charts.
Grouping project details to track the types of projects TIRCP is funding.
Looking at GHG reduction/emissions.

Data sources

TIRCP project tracking Excel workbook.

MSD Dashboard Metric: Paratransit-using Californian GTFS coverage

Question

What % of Californians have access to GTFS-Flex data?

Metrics

The % of Californians that are within areas defined in the locations.geojson file of feeds with GTFS-Flex data.

Data sources

Census data
Feeds of providers

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: MSD Dashboard, model which agencies would most improve RT and accessibility coverage

Question

If we were able to support a select number of agencies in providing GTFS accessibility data and GTFS-RT, which would make the biggest impact in overall coverage statewide (as measured in #169 and #170)?

Metrics

estimated coverage increase by agency

Data sources

Data Warehouse (GTFS/RT)
Census (block level from 2020?)

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Analysis of which validation codes, notices, etc are most frequent

Currently, we have all of the GTFS validator results stored in validation_notices, broken down by type, number of errors, and if they are resolved month to month.

@mcplanner has put togehter a help list of "severity" of GTFS notices. We have metrics stored in this spreadsheet

let's setup a notebook or similar to track these over time

Research: Prototype functions, scripts for analytics portfolio

Question

Use bus_service_increase project as a prototype of how to set up scripts to run data cleaning, data assembly, visualization (charts + maps), save in sub-directories for various districts, counties, etc.

Currently, work is done in notebooks, and while it calls functions, the entire workflow to produce visualizations can benefit from more automation.

Long-term goal: create analytics portfolio for districts to access similar set of metrics and visualizations related to bus_service_increase, dla grants, drmt grants, parallel corridors, rt delay.

Metrics

Take existing work that lives in notebooks and generalize to accommodate various subsets of data (by district, by MPO, etc) and produce similar set of outputs.

test3

GitHub Action for nbviewer link generation

Similar to the one we have in the notebooks repo.

test test

Research: All GTFS-RT Speedmaps and Tools available for Better Bus Challenge

Question

How can ongoing RT speed/delay work best support Caltrans districts and other jurisdictions in improving the bus experience, statewide?

Metrics

Speedmaps and other metrics ran for each operator with available RT data
Results segmented by district, reproducible, and accessible alongside other ongoing analysis work (989, etc)
Instructions/documentation drafted
Instructions/documentation reviewed and complete

Data sources

GTFS-RT vehicle positions (existing raw table in warehouse)
GTFS Schedule data (in warehouse)

Research: GTFS-RT Speedmaps and Presentation Ready for D10

Question

Meeting scheduled for Feb 22. Before then, need to have speedmaps and metrics generated for District 10 transit operators, as well as a polished presentation.

Metrics

Speedmaps and other metrics ran for each D10 operator with available RT data
Presentation drafted
Presentation reviewed and complete

Data sources

GTFS-RT vehicle positions (existing raw table in warehouse)
GTFS Schedule data (in warehouse)

MSD Dashboard Metric: GTFS coverage for a wheelchair-user

Question

What % of California has (open to the public) transit coverage in GTFS which is explicitly wheelchair accessible?

Metrics

By area:
The % of non-water area of Californian that is within 1/4 mi of a bus stop or 1 mi of a ferry/rail stop that is explicitly wheelchair accessible (and if in a station, that station has explicit pathways coding), and that has is served by a public-funded, open to the general public transit service with GTFS Schedule data that is served by a service that is explicitly wheelchair accessible

By Population:
The % of Californians that are within 1/4 mi of a bus stop or 1 mi of a ferry/rail stop that is explicitly wheelchair accessible (and if in a station, that station has explicit pathways coding), and that has is served by a public-funded, open to the general public transit service with GTFS Schedule data that is served by a service that is explicitly wheelchair accessible

By Employment (optional):
The % of Jobs that are within 1/4 mi of a bus stop or 1 mi of a ferry/rail stop that is explicitly wheelchair accessible (and if in a station, that station has explicit pathways coding), and that has is served by a public-funded, open to the general public transit service with GTFS Schedule data that is served by a service that is explicitly wheelchair accessible

NOTE: - I don't think we need to or should interface this with any census data about having a disability.

Thoughts on this:

we should assume that anybody could be disabled at any time (especially as people age in place).
people using wheelchairs shouldn't be limited in where they go to where they have already self-selected to go. Per various ADA, Unruh, etc, we should be evaluating based on what access the general public has.
The data isn't really there in a great way to support it anyway.

Data sources

Census data
Airtable database listing transit services, eligibilities, and GTFS data.
stops.txt for GTFS datasets and shapes.txt for continuous stops and locations.geojson for flexible services.

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Research: Consolidated Application Applicants

Data Questions:

This issue looks at the the organizations who applied for 5311/5311(f)/CMAQ, 5339(a), and/or LCTOP using the new Consolidated Application process. Applicants only need to complete one application for the funds above, once a year. The deadlines are in late March for LCTOP and late April for the other programs.

Metrics:

Which organizations applied?
Which funds are the most vied for?
How much funding did the organizations receive?
What are they planning to use the funds for?
What mix of funds did organizations apply for?

Data sources:

Black Cat Data (Caltrans' grant management tool)
More info on Consolidated Application.

Research: Planning and Modal Advisory Committee prep

Question

As part of the Caltrans Strategic Plan, Performance Plan, one of the big strategical goals related to multimodal transportation is P-01, to increase the total amount of service on the SHN and the reliability of that service by 2024.

Meeting on 3/9/22, @edasmalchi and @tiffanychu90 to present.

Metrics

Take expansive definition of "service on the SHN" to include all transit routes within 1 mile buffer, use parallel-corridors analysis
Count number of service hours and routes provided on typical weekday total, on SHN, for other intersecting routes that aren't parallel but somewhat touch SHN, and those that are not at all parallel
Show district breakdown of parallel routes and service
Show RT maps for subset of parallel routes to measure reliability of service

presentation sent to Abby Jackson by 3/3/22

Data sources

Build on existing work in bus-service-increase and parallel-corridors
GTFS scheduled
GTFS RT

(Data Services Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

Question: Question written as a single sentence.
View: E.g. views.gtfs_schedule_fact_daily_feed_files.
Research:
- How should the results be presented?
- When are they needed by?

After reviewing research with the asker:

Metric: what specific calculations are needed?
Dashboard: where should we put the result?

Division of Local Assistance (DLA) Data Driven Grant Management

Questions

Where can data help to standardize grant management?
Which grants are the most important in the eyes of the customer? If any, where is the overlap?
Where in the application process is the customer at any given moment?

Metrics

Number of grants by grant type, location and awardee, amount awarded

Data Sources:

Division of Local Assistance (DLA)
E-76 Obligated Data

Goals/Tasks:

Develop a unified schema for grant tracking and write automated scripts for programs already using databases
Determine the data structure of the database
Create a cohesive list of active DLA grants issued by State and Federal programs
Produce geographies interest for existing and potential grant applicants
Center the grant process around the customer

cal-itp / data-analyses Goto Github PK

data-analyses's People

Contributors

Stargazers

Watchers

Forkers

data-analyses's Issues

Question

Metrics

Data sources

(Data Servicess Team to Copy and Fill Out Below)

Question

Metrics

Data sources

Deliverable

(Data Services Team to Copy and Fill Out Below)

Question

Metrics

Data sources

(Data Servicess Team to Copy and Fill Out Below)

Question

Metrics

Data sources

(Data Services Team to Copy and Fill Out Below)

Data Questions:

Metrics:

Data sources:

Tasks and Goals:

Question

Metrics

Data sources

User Story

Additional Context

Acceptance Criteria

Sprint Ready Checklist

Appendix

Overall airtable filter

Reporting Category

Currently Operating

Service Type

Additional service type filter

Question

Metrics

Data sources

(Data Servicess Team to Copy and Fill Out Below)

Question

Metrics

Data sources

Deliverables:

Question

Metrics

Data sources

Question

Metrics

Data sources

Methodology

(Data Services Team to Copy and Fill Out Below)

Question

Metrics

Data sources

(Data Servicess Team to Copy and Fill Out Below)

Question

Metrics

Data sources

(Data Servicess Team to Copy and Fill Out Below)

Question

Metrics

Data sources

(Data Services Team to Copy and Fill Out Below)

User stories

Summary

Question

Metrics

Data sources

Question

Metrics

Data sources

(Data Servicess Team to Copy and Fill Out Below)

Question

Metrics