uw-msds-trac-capstone

Modeling King County Bus Ridership

Background

Bus route planning and optimization is an integral part of city planning. Understanding bus ridership enables transit agencies to make planning decisions and better understand the impact of station closures, headway changes, and weather events. However, no model or data sources exist to measure total ridership across the King County Metro system. ORCA payment card transactions are available, but only measure riders who pay with ORCA. Automatic person counter (APC) measures all riders but is only present for ~60% of trips during a survey period of 2 months, twice a year. The goal of this project is to predict total ridership (APC) across route, direction, and time of day using ORCA transactions and route metadata.

We use APC data from the King County Metro system to create a machine learning model that accurately estimates actual ridership using the incomplete ORCA transactional data, which can estimate total actual ridership across all routes, directions, and times of day, regardless of whether the bus was outfitted with automated person counting technology.

To do this we:

Created a data processing pipeline that merges APC data, ORCA data, and route metadata to create training sets.
Trained a shallow, wide neural network utilizing the sigmoid function in the first layer with a linear output neuron.

For more information on the methodology and results, see here.

Installation

Clone the repository
Create a new python environmet using the command:
conda env create UWTRAC
Activate UWTRAC by using the command:
conda activate UWTRAC
Install the required python packages using:
pip install –r requirements.txt

How to Use/Examples

Use the User Guide to get started.

Directory Structure

uw-msds-trac-capstone/
  |- data/
     |- predictions/
	**Predictions from final neural network for
	  time aggregates[15min, 30min, hr] on the 
	  combined (summer + winter) dataset.**
	|- model_final_nn/
	   |- final_nn_[15min]_test.txt
	   |- final_nn_[15min]_xval.txt
	**Predictions from models[nn, clustered_svm, xgb]
	  on differenct time aggregates[15min, 30min] and
          seasons[combined, summer, winter].**
	|- model_[nn]/
	   |- [nn]_[15min]_[combined]_test.txt
	   |- [nn]_[15min]_[combined]_xval.txt
     |- training_data/
	**Training splits for survey seasons
          [combined, summer, winter].**
	|- [combined]_data/
	   **Training datasets for different time aggregations
	     [15min, 30min, ampm, day, hr].**
	    |- [15min]/
	       |- test.tsv.gz
               |- train.tsv.gz
               |- val.tsv.gz	
     |- boeing_field_2019.csv
     |- boeing_field_2019.csv
     |- rte_clean.csv
  |- docs/
     |- Abstract [NEED]
     |- Paper [NEED]
     |- Performance.PNG
     |- Poster
  |- eda/
     |- reports_apc/
	|- correlates/
	   |- stop_id-VS-stop_name.tsv
	|- numerics.tsv
	|- unique_counts.tsv
	|- unique_vals.tsv
     |- reports_orca/
	|- correlates/
	   |- device_location_id-VS-device_location_descr.tsv
	   |- direction_id-VS-direction_descr.tsv
	   |- mode_id-VS-mode_descr.tsv
	   |- origin_location_id-VS-origin_location_descr.tsv
	   |- product_id-VS-product_descr.tsv
	   |- service_agency_id-VS-service_agency_name.tsv
	   |- source_agency_id-VS-source_agency_name.tsv
	   |- txn_passenger_type_id-VS-txn_passenger_type_descr.tsv
	   |- txn_type_id-VS-txn_type_descr.tsv
	   |- viaserviceareaid-VS-viaserviceareaname.tsv
	|- numerics.tsv
	|- unique_counts.tsv
	|- unique_vals.tsv
     |- src/
	|- agg_apc.py
	|- agg_orca.py
	|- figure_out_direction.py
	|- filter_apc.py
	|- filter_orca.py
	|- generic_inspec.py
	|- inspect_apc.py
	|- inspect_orca.py
	|- merge_apc_orca.py
     |- winter_summer_EDA.ipynb
  |- evaluation/
     |- cluster_rte_frequency/
	|- 00_gather_route_info.ipynb
	|- rte_clusters.tsv
     |- model_bias/
	|- plots/
	   **Plots created by python notebook.**
	|- bias.ipynb
     |- model_feature_explain/
	|- contributions_by_feature.svg
	|- describe_nn.ipynb
     |- model_preformance/
	|- model_final_nn/
           **Final neutral network performance for 
             difference time aggregations[15min, 30min, hr]
             and with/without rapid ride.
	   |- final_nn_[15min]/
	      **Plots created by python notebook.**
	   |- final_nn_[15min]_no_rr/...
	**Various Models [nn, svm, xgb] for different
          time aggregation [15min, 30min] and survey
          periods [combined, summer, winter, combined]** 
	|- model_[nn]/
	   |- [nn]_[15min]_[combined]/
	      **Plots created by python notebook.**
	|- plots/
	   **Plots created by python notebook.**
	|- 01_reports_performance.py
	|- 15m_test_group_perg.csv
	|- batch_compare.tsv
	|- compare_perf.ipynb
	|- evaluate.sh
	|- genBarPlots.py
     |- ridership_by_day/
	|- plots/ 
	   **Plots created by python notebook.**
	|- ridership_by_day_evaluation.ipynb
  |- examples/
     |- User_Guide.pdf
  |- models/
     |- final_nn/
	**Model components per time aggregation modeled [15m, 30m, hr].**
	|- [15m]_column_labels.pkl
	|- [15m]_one_hot_encoder.pkl
	|- [15m]_standard_scaler.pkl
	|- model_[15m].json
	|- model_[15m]_weights_train.h5
	|- model_15m_weights_train_and_xval.h5
     |- model_iterations/
	|- clustered_linear.ipynb
        |- clustered_nn.ipynb
	|- linear.ipynb
	|- XGB [NEED]
     |- final_nn.ipynb
  |- pipeline/
     |- validate_pipeline/
	|- analyze_stopfile.py
        |- sample_files.py
        |- pipeline.ipynb
     |- 01_filter_apc.py
     |- 02_filter_apc.py
     |- 03_agg_orca.py
     |- 04_merge.py
     |- 05_create_training.py
     |- constants.py
     |- functions.py
  |- LICENSE
  |- README.md
  |- requirements.txt

Licensing

The code in this repository is licensed under a MIT license.

Acknowledgements

Mark Hallenbeck at TRAC for his time and subject expertise.
Dmitri Zyuzin for his help gathering the data for this project.
Megan Hazen for her guidance in this capstone project and expertise on neural networks.

This analysis was done for the University of Washington's Master of Data Science's Capstone. More information about the class can be found here.

mag3141592 / uw-msds-trac-capstone Goto Github PK

uw-msds-trac-capstone's Introduction

uw-msds-trac-capstone

Modeling King County Bus Ridership

Background

Installation

How to Use/Examples

Directory Structure

Licensing

Acknowledgements

uw-msds-trac-capstone's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent