cityofaustin / atd-vz-data Goto Github PK

The technology that powers the City of Austin's Vision Zero program

Home Page: https://visionzero.austin.gov/viewer/

Dockerfile 0.22% Shell 0.10% HTML 0.32% JavaScript 72.92% CSS 0.07% Python 11.54% PLpgSQL 14.72% SCSS 0.11%

vision-zero postgresql react hasura

atd-vz-data's Introduction

Vision Zero Crash Data System

This repository is home base for a suite of applications that help centralize and streamline the management of ATD's Vision Zero data. As a result of this project, staff will have a standardized interface for reviewing crash data, prioritizing intersection safety improvements, and communicating efforts to the public. Additionally, high quality VZ data will be publicly accessible online.

atd-cr3-api

This folder hosts our API that securely downloads a private file from S3. It is written in Python & Flask. and it is deployed in a Lambda function with Zappa.

more info

atd-etl (Extract-Transform-Load)

Our current method for extracting data from the TxDOT C.R.I.S. data system uses a python library called Splinter to request, download and process data. It is deployed as a Docker container.

For step-by-step details on how to prepare your environment and how to execute this process, please refer to the documentation in the atd-etl folder.

more info

atd-vzd (Vision Zero Database)

VZD is our name for our Hasura GraphQL API server that connects to our Postgres RDS database instances.

more info

Production site: http://vzd.austinmobility.io/ Staging site: https://vzd-staging.austinmobility.io/

atd-vze (Vision Zero Editor)

VZE is our front end application built in React.js with CoreUI that allows a trusted group of internal users to edit and improve the data quality of our Vision Zero data. It consumes data from Hasura/VZD.

more info

Production site: https://visionzero.austin.gov/editor/ Staging site: https://visionzero-staging.austinmobility.io/editor/

atd-vzv (Vision Zero Viewer)

VZV is our public facing home for visualizations, maps, and dashboards that help make sense and aggregate trends in our Vision Zero Database

more info

Production site: https://visionzero.austin.gov/viewer/ Staging site: https://visionzero-staging.austinmobility.io/viewer/

atd-toolbox

Collection of utilities related to maintaining data and other resources related to the Vision Zero Data projects.

Local Development

The suite has a python script which can be used to run and populate a local development instance of the stack. The script is found in the root of the repository, and is named vision-zero. It's recommended to create a virtual environment in the root of the repo, and if you name it venv, it will be ignored by the .gitignore file in place. VS Code will automatically source the activation script, if you start a terminal from within it to interface with the stack.

The vision-zero program is a light wrapper around the functionality provided by docker compose. By inspecting the docker-compose.yml file, you can find the definitions of the services in the stack, and you can use the docker compose command to turn up, stop, and attach terminals to the running containers and execute on-off commands. This can provide you access to containers to install nodejs libraries, use postgres' supporting programs (psql, pg_dump) and other lower level utilities.

Ideally, you should be able to operate the entire vision zero suite and access all needed supporting tooling from any host that can provide a working docker service & python interpreter for the orchestration script.

`vision-zero` command auto-completion

The vision-zero application is able to generate auto-completion scripts via the shtab python library. For example, zsh users may use the following to enable this feature. bash and csh users will have similar steps to follow particular to their shell of choice.

mkdir ~/.zsh_completion_functions;
chmod g-w,o-w ~/.zsh_completion_functions;
cd $WHEREVER_YOU_HAVE_VZ_CHECKED_OUT;
source ./venv/bin/active;
./vision-zero -s zsh | tee ~/.zsh_completion_functions/_vision-zero

Examples of `vision-zero` commands

Note: There is a flag which ends up being observed for any of the following commands which start the postgres database:

-r / --ram-disk will cause the database to back its "storage" on a RAM disk instead of non-volatile storage. This has the upside of being much faster as there is essentially no limit to the IOPS available to the database, but the data won't be able to survive a restart and will require being replicate-db'd back into place.

The default is to use the disk in the host to back the database, which is the operation our team is most familiar with, so if you don't need or want the RAM disk configuration, you can ignore this option.

`vision-zero build`

Rebuild the stack's images based on the Dockerfiles found in the repository. They are built with the --no-cache flag which will make the build process slower, but avoid any stale image layers that have inadvertently cached out-of-date apt resource lists.

`vision-zero db-up` & `vision-zero db-down`

Start and stop the postgres database

`vision-zero graphql-engine-up` & `vision-zero graphql-engine-down`

Start and stop the Hasura graphql-engine software

`vision-zero vze-up` & `vision-zero vze-down`

Start and stop the Vision Zero Editor

`vision-zero vzv-up` & `vision-zero vzv-down`

Start and stop the Vision Zero Viewer

`vision-zero psql`

Start a psql postgreSQL client connected to your local database

`vision-zero tools-shell`

Start a bash shell on a machine with supporting tooling

`vision-zero stop`

Stop the stack

`vision-zero replicate-db`

Download a snapshot of the production database
Store the file in `./atd-vzd/snapshots/visionzero-{date}-{with|without}-change-log.sql
Drop local atd_vz_data database
Create and repopulate the database from the snapshot

Note: the -c / --include-change-log-data flag can be used to opt to include the data of past change log events. The schema is created either way. Note: the -f / --filename flag can be optionally used to point to a specific data dump .sql file to use to restore.

The way the snapshots are dated means that one will only end up downloading one copy of the data per-day, both in the with and without change log data.

`vision-zero dump-local-db`

pg_dump the current local database
Stores the file in `./atd-vzd/dumps/visionzero-{date}-{time}.sql

`vision-zero remove-snapshots`

Remove snapshot files. This can be done to save space and clean up old snapshots, but it's also useful to cause a new copy of the day's data to be downloaded if an upstream change is made.

Technology Stack

Technologies, libraries, and languages used for this project include:

Docker
Hasura
PostgreSQL
React (Javascript)
Core UI (HTML & CSS)
Python

License

As a work of the City of Austin, this project is in the public domain within the United States.

Additionally, we waive copyright and related rights of the work worldwide through the CC0 1.0 Universal public domain dedication.

atd-vz-data's People

Contributors

Stargazers

Watchers

Forkers

rgreinho johnclary

atd-vz-data's Issues

VZE: Modify records (CRIS+)

A user should be able to modify, add, delete records to augment data imported from CRIS or other sources.

In the VZE workflow, Program Managers should be able to act as "editors" and update existing crash records

Knack: Create API Views for VZV

Need more definition around what views will be required but we can assume that we'll want to expose crash data (along with other related records) via the Knack view-based request API. IF so, we need to create pages/views with this API data.

ETL: Get Locations from ArcGIS (When ready)

Write Script that can:

Query ArcGIS API that can load list of items (polygons, intersections) and
Iterate through each element and display the item id and polygon coordinates.

ETL: CRIS: Upload to S3 after processing

Upload files to S3 after processing for archiving or further analysis.

ETL: CRIS: Lint & Document Extraction Code

Lint and Document capybara scripts:

Be sure all ruby files in the cityofaustin/atd-vz-data/atd-cris-capybara folder are commented and linted.
Lint and document sh files in the same directory.
Add a README to describe the purpose of each file.

Learn more about Geocoding options and lat/lon errors from CRIS

~10% of lat/lons come back blank from CRIS. We need to learn more about the geocoding service TxDOT uses so we can see if other services give us fewer blank results
We also learned that while coordinates for arterials/local streets are generally good, freeway coordinates aren't (Ex: I-35 & Cesar Chavez). Often times, the CR3 narratives have better details.

meeting with TxDOT to be coordinated by Lewis

ETL: Review BASH script that APD uses to automate CRIS data loading

Create a presentable Entity Relationship Diagram

Now that we've white-boarded out the main data objects with @johnclary and again with Lewis, we need something we can share with the rest of the stakeholder team and revise as needed.

I'll probably try using draw.io at first.

Whiteboard images saved here:
https://github.com/cityofaustin/transportation-vz-data/wiki/Data-Modeling-Whiteboard-Images

ETL: CRIS Initial Capybara Tests

We need to have initial environment tests running capybara and pinging the CRIS website.

ETL: CRIS Upload to GitHub or set up Repo

We settled on using atd-vz-data as the repository for this code, I have created this issue to associate new branches to it.

ETL: CR3

Automate the attachment of CR3 PDFs to Crash records.

In the future we may want to extract out fields from the CR3, specifically Narrative and crash diagram fields.

Explore options for automation of Brazos (Capybara or API)
else: Automating retrieval of PDFs from G: network drive
upload PDFs to S3 bucket and make database association on crash record ID
OCR pdf/crop narrative & crash diagram sections - hoping that we'll be able to work through Brazos instead of doing this... 🤞

ETL: CRIS Deploy container to atd-data02 server

We will need to deploy the container to atd-data01 server:

Will run every hour and check for S3 files (emails) in the atd-visionzero-data bucket
Will run every 24 hours and request new reports.

Second pass at scooter crash diagram

update the crash diagram given input from last team meeting

https://www.draw.io/

import:
https://gist.githubusercontent.com/mateoclarke/c5e7081d9dd6ccd874f2af5b32a059b0/raw/b32825a19fbf2688b2b9ea56353f143a295d9867/Crash_Incident_Flowchart.xml

ETL: CRIS implement email processing in capybara

Implement a capybara script that:

It needs to be able to log in to CRIS' website if needed.
It needs to be able to download emails from S3
It needs to be able to download reports within the emails
It needs to be able to delete emails from s3

This script needs to live within the same container created in (https://github.com/cityofaustin/atd-vz-data/blob/master/atd-cris-capybara) this is because we can re-use the same code and the deployment would be significantly simpler.

ETL: CRIS Implement Lambda Function

Once we have lambda up and running and a working pipeline, we need to actually parse content and put the data against a repository.

ETL: Associate crashes with locations

From comment below:

All though we have the relationships setup in Knack for this, the ETL need here is to associate a crash record (a point) with a discrete, pre-defined location (a polygon) using spatial processing. This process would locate the record ID of any overlapping location, and set the Knack record link accordingly.

We can use the atd-agol-util for this.

We first need to define those location records/geometries. Will open a separate issue.

ETL: CRIS Build Docker Containers/Environment

We need to have a Capybara + Webkit driver

Schedule next scooter data discovery meeting

Aiming for week of May 13th

ETL: CRIS Extracted Data: Load Locations into Knack/PostgreSQL

Write Insertion Script that:

Uses Knackpy to load Data into Locations table.
Load list into Knack Locations and/or PostgreSQL
Set up PostgreSQL Table
Set up Test Knack environment

Find someone in law dept regarding legal aspects of crashes

what data (public or not) that might be available regarding scooter crashes.

insurance?
court filings?

https://www.citylab.com/transportation/2019/01/scooter-crash-accidents-safety-liability-bird-lime/577687/

Migrated to atd-data-tech #1988

Knack: Configure ADFS Login

Setup ADFS login for this app. I gave John D a heads-up this was coming. We've done this four times now, so it should be straightforward.

ETL: MoPED 🛵

🛵🛵🛵

In a future world where comprehensive data on mobility projects is centralized, we'd pull in mobility project data from the MoPED (Mobility Project Enterprise Database).

😉 @amenity

Migrated to atd-data-tech #1754

Map Requested Field Names to CRIS Extract

Email to Lewis on 6/4:

You’ve provided us with a list of fields (at the bottom of this email) that you’d like to include in the crash database. We need to map these fields to the field names that are in the CRIS extract. I’ve attached a complete listing of the fields in the public CRIS extract.

Is this something Frank could work on? He can just add a column on the attached XLSX and put the humanized field name from your list next to the corresponding field in the CRIS extract. Their may be a few fields that don’t map 1 to 1, or on present. That’s fine. Just note them and we’ll review

Case ID
Date
Time
Roadway system
Roadway part
Block number and all street info, including speed limit
Lat/Long
Unit description (this identifies the motor vehicle, pedestrian, cyclist, etc.)
Hit and Run
DL class
DL restriction
All Units - address (person)
All Units - Vehicle year, make, model
All Units - body style
All Units - injury severity
All Units - age, ethnicity, sex, restraint, alcohol/drug specimen, drug test result
Proof of liability
Contributing Factors – all which are captured in “Contributing” and “May have”
Crash severity
Injury severity
Intersection-relation type
Light condition
Railroad related flag
Road construction flag
Crash speed limit
Active School Zone
Road type
Highway system, number, and suffix for both primary and secondary
On system
School bus flag
Toll Road flag
Traffic control
Weather condition
Surface condition
DWI charge
Object struck
Helmet
Day of week
Month
Year
Collision
Unit group involved
Crash type
Fatality
PRSN_TYPE_ID

ETL: CRIS Attempt login and form submission

We need to be able to log in and submit data through Capybara

VZE: Study Locations

Within the VZE interface, Transportation Engineering users should be able to create study locations from an existing table of locations.

They should be able to sort filter and rank.

Requirements around how ranking works may need further definition.

ETL: CRIS Upload records to Knack

We are going to need to upload records to knack, they don't have to have a location assigned.

Knack: Turn DB entity relationship diagram into Knack Tables

map out connections
list out attributes and data types for all tables

https://drive.google.com/file/d/1oFzvdmVGYivzLN8_q1oSJ0f6K

pEGrPx5/view

VZE: Data QA Page

As new data is imported from CRIS, there may be data quality issues that need manual intervention before they are treated as valid data.

They could have missing field, they could be duplicates or there might be other reasons from manual data entry quality control.

How do we manage version control of data in pending states.

ETL: CRIS Extracted Data: Mimic CSV files into PostrgreSQL tables

Create Tables in PostgreSQL with same structure as CSV files for:

ETL: Create Location Records Polygon Endpoint

We need a polygon layer of locations (aka, intersections, but trying to get away from that terminology since we need mid-block geometries as well) which will serve as the immutable geometry for aggregating crash data. The data should be (ideally) hosted on AGOL so that we can integrate with REST services and not insane proprietary Esri bs.

Migrated to atd-data-tech #1996

ETL: CRIS set up CircleCI and production branch

GIS: Create canonical intersection layer on ArcGIS Online

Daniel Y will provide a copy. Need to talk to Jaime M as well; she has recently been working with this data.

Once we have the layer we need to load it to ArcGIS online and figure out how it will be versioned and backed up.

Migrated to atd-data-tech #1596

VZV: inventory NYC pages/components

If we are able to fork the VZV.NYC code, we need to evaluate which parts of the code base are relevant to us given our available data.

For example, they include data about things like signal interventions which we aren't yet tracking.

Go through NYC VZV and identify individual components. Sets the groundwork for a discussion about utility and priority of each one for our implementation.

ETL: CRIS data

Automate the import process for CRIS data extracts. This isn't necessary to have a working system but we know APD has scripts that do this on a cron. This would replace the need to manually load data like we plan to at first in #17.

VZV: evaluate forkability of NYC VZV

Evaluate the forkability of NYC DOT's Vision Zero Viewer application for use with our data:

https://github.com/nycdot/Vision_Zero_View/
http://www.vzv.nyc/
This will require understanding their AGOL usage and what layers and attributes we may need to define and create in AGOL.

Knack: mid-block segments in Locations Table

We can import intersection locations from ArcGIS the same way we do in Data Tracker, but how would represent do mid-block segments?

GIS: Crash and Intersection Viewer on ArcGIS Online

This map will display crash locations and intersection polygons, and can be embedded into the knack app so users can interactively view crashes and locations.

Migrated to atd-data-tech #1980

ETL: Set up Lambda Function

Once the AWS Email is set up, we will also need to set up a lambda function, preferably in python. This should take 1 day in my opinion.

Architecture meeting for VZV app

@mateoclarke to schedule

Agenda:

Postgres
Knack API + views
React framework
Serverless/Faregate
use cases: data dashboards that we know we'll need

ETL: CRIS Set up AWS Email

We will need to set up an AWS Email Account, this should only take a few minutes.

ETL: CRIS: Implement Database Constraints

When working on the data provided by CRIS, I found a bunch of duplicate records in the charges table (about 300,000 records were some kind of duplicate). We will need constraints for some tables (not all) in order to have data integrity.

Set up account with TXDOT for CRIS extracts

From last email with Lewis:

For next steps, I wonder if there is value in setting up the account with TxDOT through their website and doing a sample extract for all the fields that we wanted to have and see what we’re working with. (Combined list of fields requested from ATD staff is below.) Doing this test with some of the 2016 bond top “location” data/info from Boni and starting to relate the data would be interesting as well.

Website: https://www.txdot.gov/government/enforcement/data-access.html

Guide docs: http://ftp.dot.state.tx.us/pub/txdot-info/trf/crash_statistics/automated/cris-guide.pdf

Fields:

Case ID
Date
Time
Roadway system
Roadway part
Block number and all street info, including speed limit
Lat/Long
Unit description (this identifies the motor vehicle, pedestrian, cyclist, etc.)
Hit and Run
DL class
DL restriction
All Units - address (person)
All Units - Vehicle year, make, model
All Units - body style
All Units - injury severity
All Units - age, ethnicity, sex, restraint, alcohol/drug specimen, drug test result
Proof of liability
Contributing Factors – all which are captured in “Contributing” and “May have”
Crash severity
Injury severity
Intersection-relation type
Light condition
Railroad related flag
Road construction flag
Crash speed limit
Active School Zone
Road type
Highway system, number, and suffix for both primary and secondary
On system
School bus flag
Toll Road flag
Traffic control
Weather condition
Surface condition
DWI charge
Object struck
Helmet
Day of week
Month
Year
Collision
Unit group involved
Crash type
Fatality
PRSN_TYPE_ID

Scooter Crash Data Sharing

DTS Portal Link

Coordinating a data discovery sprint with counterparts at APH, EMS, and hopefully APD to understand what data is available related to scooter crashes

Migrated to atd-data-tech #1618

ETL: Microstrategy integration for fallback lat/lon fields

We've learned what it looks like when APD adds CAD lat/lons to CRIS data:
https://drive.google.com/file/d/1lJTbM2ArfEUjPp-o5qsKwIcG-G5Tx6XY/view?usp=sharing

There may be the possibility of APD adding these CAD Long and CAD Lat fields to Microstrategy. Which might be a good enough reason to setup an integration to have fallback lat/lon fields when geocoding fails. Geocoding can be improved using a different service from CRIS but if there is a text input error or incomplete address data, using a completely separate lat/lon source might be necessary.

Spin up an RDS Postgres DB
Work on ruby script that will parse the CSV and UPSERT into Postgres:
When done, upload CSV files to S3 for archiving or further analysis.

ETL: Publish crash data to ArcGIS Online (AGOL)

Migrated to atd-data-tech #1866

VZE: Tracking signal safety interventions

AMD signal engineers often makes signal modifications with the justification for safety improvements to intersections. How can we track these interventions and link them back to locations.

Migrated to atd-data-tech #1992

cityofaustin / atd-vz-data Goto Github PK

atd-vz-data's Introduction

Vision Zero Crash Data System

atd-cr3-api

atd-etl (Extract-Transform-Load)

atd-vzd (Vision Zero Database)

atd-vze (Vision Zero Editor)

atd-vzv (Vision Zero Viewer)

atd-toolbox

Local Development

vision-zero command auto-completion

Examples of vision-zero commands

vision-zero build

vision-zero db-up & vision-zero db-down

vision-zero graphql-engine-up & vision-zero graphql-engine-down

vision-zero vze-up & vision-zero vze-down

vision-zero vzv-up & vision-zero vzv-down

vision-zero psql

vision-zero tools-shell

vision-zero stop

vision-zero replicate-db

vision-zero dump-local-db

vision-zero remove-snapshots

Technology Stack

License

atd-vz-data's People

Contributors

Stargazers

Watchers

Forkers

atd-vz-data's Issues

Recommend Projects

Recommend Topics

Recommend Org

`vision-zero` command auto-completion

Examples of `vision-zero` commands

`vision-zero build`

`vision-zero db-up` & `vision-zero db-down`

`vision-zero graphql-engine-up` & `vision-zero graphql-engine-down`

`vision-zero vze-up` & `vision-zero vze-down`

`vision-zero vzv-up` & `vision-zero vzv-down`

`vision-zero psql`

`vision-zero tools-shell`

`vision-zero stop`

`vision-zero replicate-db`

`vision-zero dump-local-db`

`vision-zero remove-snapshots`