Coder Social home page Coder Social logo

cityofaustin / atd-vz-data Goto Github PK

View Code? Open in Web Editor NEW
11.0 13.0 2.0 30.81 MB

The technology that powers the City of Austin's Vision Zero program

Home Page: https://visionzero.austin.gov/viewer/

Dockerfile 0.22% Shell 0.10% HTML 0.32% JavaScript 72.92% CSS 0.07% Python 11.54% PLpgSQL 14.72% SCSS 0.11%
vision-zero postgresql react hasura

atd-vz-data's Introduction

Vision Zero Crash Data System

This repository is home base for a suite of applications that help centralize and streamline the management of ATD's Vision Zero data. As a result of this project, staff will have a standardized interface for reviewing crash data, prioritizing intersection safety improvements, and communicating efforts to the public. Additionally, high quality VZ data will be publicly accessible online.

atd-cr3-api

This folder hosts our API that securely downloads a private file from S3. It is written in Python & Flask. and it is deployed in a Lambda function with Zappa.

more info

atd-etl (Extract-Transform-Load)

Our current method for extracting data from the TxDOT C.R.I.S. data system uses a python library called Splinter to request, download and process data. It is deployed as a Docker container.

For step-by-step details on how to prepare your environment and how to execute this process, please refer to the documentation in the atd-etl folder.

more info

atd-vzd (Vision Zero Database)

VZD is our name for our Hasura GraphQL API server that connects to our Postgres RDS database instances.

more info

Production site: http://vzd.austinmobility.io/ Staging site: https://vzd-staging.austinmobility.io/

atd-vze (Vision Zero Editor)

VZE is our front end application built in React.js with CoreUI that allows a trusted group of internal users to edit and improve the data quality of our Vision Zero data. It consumes data from Hasura/VZD.

more info

Production site: https://visionzero.austin.gov/editor/ Staging site: https://visionzero-staging.austinmobility.io/editor/

atd-vzv (Vision Zero Viewer)

VZV is our public facing home for visualizations, maps, and dashboards that help make sense and aggregate trends in our Vision Zero Database

more info

Production site: https://visionzero.austin.gov/viewer/ Staging site: https://visionzero-staging.austinmobility.io/viewer/

atd-toolbox

Collection of utilities related to maintaining data and other resources related to the Vision Zero Data projects.

Local Development

The suite has a python script which can be used to run and populate a local development instance of the stack. The script is found in the root of the repository, and is named vision-zero. It's recommended to create a virtual environment in the root of the repo, and if you name it venv, it will be ignored by the .gitignore file in place. VS Code will automatically source the activation script, if you start a terminal from within it to interface with the stack.

The vision-zero program is a light wrapper around the functionality provided by docker compose. By inspecting the docker-compose.yml file, you can find the definitions of the services in the stack, and you can use the docker compose command to turn up, stop, and attach terminals to the running containers and execute on-off commands. This can provide you access to containers to install nodejs libraries, use postgres' supporting programs (psql, pg_dump) and other lower level utilities.

Ideally, you should be able to operate the entire vision zero suite and access all needed supporting tooling from any host that can provide a working docker service & python interpreter for the orchestration script.

vision-zero command auto-completion

The vision-zero application is able to generate auto-completion scripts via the shtab python library. For example, zsh users may use the following to enable this feature. bash and csh users will have similar steps to follow particular to their shell of choice.

mkdir ~/.zsh_completion_functions;
chmod g-w,o-w ~/.zsh_completion_functions;
cd $WHEREVER_YOU_HAVE_VZ_CHECKED_OUT;
source ./venv/bin/active;
./vision-zero -s zsh | tee ~/.zsh_completion_functions/_vision-zero

Examples of vision-zero commands

Note: There is a flag which ends up being observed for any of the following commands which start the postgres database:

-r / --ram-disk will cause the database to back its "storage" on a RAM disk instead of non-volatile storage. This has the upside of being much faster as there is essentially no limit to the IOPS available to the database, but the data won't be able to survive a restart and will require being replicate-db'd back into place.

The default is to use the disk in the host to back the database, which is the operation our team is most familiar with, so if you don't need or want the RAM disk configuration, you can ignore this option.

vision-zero build

Rebuild the stack's images based on the Dockerfiles found in the repository. They are built with the --no-cache flag which will make the build process slower, but avoid any stale image layers that have inadvertently cached out-of-date apt resource lists.

vision-zero db-up & vision-zero db-down

Start and stop the postgres database

vision-zero graphql-engine-up & vision-zero graphql-engine-down

Start and stop the Hasura graphql-engine software

vision-zero vze-up & vision-zero vze-down

Start and stop the Vision Zero Editor

vision-zero vzv-up & vision-zero vzv-down

Start and stop the Vision Zero Viewer

vision-zero psql

Start a psql postgreSQL client connected to your local database

vision-zero tools-shell

Start a bash shell on a machine with supporting tooling

vision-zero stop

Stop the stack

vision-zero replicate-db

  • Download a snapshot of the production database
  • Store the file in `./atd-vzd/snapshots/visionzero-{date}-{with|without}-change-log.sql
  • Drop local atd_vz_data database
  • Create and repopulate the database from the snapshot

Note: the -c / --include-change-log-data flag can be used to opt to include the data of past change log events. The schema is created either way. Note: the -f / --filename flag can be optionally used to point to a specific data dump .sql file to use to restore.

The way the snapshots are dated means that one will only end up downloading one copy of the data per-day, both in the with and without change log data.

vision-zero dump-local-db

  • pg_dump the current local database
  • Stores the file in `./atd-vzd/dumps/visionzero-{date}-{time}.sql

vision-zero remove-snapshots

Remove snapshot files. This can be done to save space and clean up old snapshots, but it's also useful to cause a new copy of the day's data to be downloaded if an upstream change is made.

Technology Stack

Technologies, libraries, and languages used for this project include:

  • Docker
  • Hasura
  • PostgreSQL
  • React (Javascript)
  • Core UI (HTML & CSS)
  • Python

License

As a work of the City of Austin, this project is in the public domain within the United States.

Additionally, we waive copyright and related rights of the work worldwide through the CC0 1.0 Universal public domain dedication.

atd-vz-data's People

Contributors

charlie-henry avatar chiaberry avatar dependabot[bot] avatar frankhereford avatar jgabitto avatar johnclary avatar mddilley avatar patrickm02l avatar rgreinho avatar roseeichelmann avatar rr216 avatar sergiogcx avatar tillyw avatar xavierapostol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atd-vz-data's Issues

VZE: Modify records (CRIS+)

A user should be able to modify, add, delete records to augment data imported from CRIS or other sources.

In the VZE workflow, Program Managers should be able to act as "editors" and update existing crash records

Knack: Create API Views for VZV

Need more definition around what views will be required but we can assume that we'll want to expose crash data (along with other related records) via the Knack view-based request API. IF so, we need to create pages/views with this API data.

ETL: Get Locations from ArcGIS (When ready)

Write Script that can:

  • Query ArcGIS API that can load list of items (polygons, intersections) and
  • Iterate through each element and display the item id and polygon coordinates.

ETL: CRIS: Lint & Document Extraction Code

Lint and Document capybara scripts:

  • Be sure all ruby files in the cityofaustin/atd-vz-data/atd-cris-capybara folder are commented and linted.
  • Lint and document sh files in the same directory.
  • Add a README to describe the purpose of each file.

Learn more about Geocoding options and lat/lon errors from CRIS

  • ~10% of lat/lons come back blank from CRIS. We need to learn more about the geocoding service TxDOT uses so we can see if other services give us fewer blank results
  • We also learned that while coordinates for arterials/local streets are generally good, freeway coordinates aren't (Ex: I-35 & Cesar Chavez). Often times, the CR3 narratives have better details.

meeting with TxDOT to be coordinated by Lewis

ETL: CR3

Automate the attachment of CR3 PDFs to Crash records.

In the future we may want to extract out fields from the CR3, specifically Narrative and crash diagram fields.

  • Explore options for automation of Brazos (Capybara or API)
  • else: Automating retrieval of PDFs from G: network drive
  • upload PDFs to S3 bucket and make database association on crash record ID
  • OCR pdf/crop narrative & crash diagram sections - hoping that we'll be able to work through Brazos instead of doing this... 🤞

ETL: CRIS Deploy container to atd-data02 server

We will need to deploy the container to atd-data01 server:

  • Will run every hour and check for S3 files (emails) in the atd-visionzero-data bucket
  • Will run every 24 hours and request new reports.

ETL: CRIS implement email processing in capybara

Implement a capybara script that:

  • It needs to be able to log in to CRIS' website if needed.
  • It needs to be able to download emails from S3
  • It needs to be able to download reports within the emails
  • It needs to be able to delete emails from s3

This script needs to live within the same container created in (https://github.com/cityofaustin/atd-vz-data/blob/master/atd-cris-capybara) this is because we can re-use the same code and the deployment would be significantly simpler.

ETL: Associate crashes with locations

From comment below:

All though we have the relationships setup in Knack for this, the ETL need here is to associate a crash record (a point) with a discrete, pre-defined location (a polygon) using spatial processing. This process would locate the record ID of any overlapping location, and set the Knack record link accordingly.

We can use the atd-agol-util for this.

We first need to define those location records/geometries. Will open a separate issue.

Knack: Configure ADFS Login

Setup ADFS login for this app. I gave John D a heads-up this was coming. We've done this four times now, so it should be straightforward.

ETL: MoPED 🛵

🛵🛵🛵

In a future world where comprehensive data on mobility projects is centralized, we'd pull in mobility project data from the MoPED (Mobility Project Enterprise Database).

😉 @amenity

Migrated to atd-data-tech #1754

Map Requested Field Names to CRIS Extract

Email to Lewis on 6/4:

You’ve provided us with a list of fields (at the bottom of this email) that you’d like to include in the crash database. We need to map these fields to the field names that are in the CRIS extract. I’ve attached a complete listing of the fields in the public CRIS extract.

Is this something Frank could work on? He can just add a column on the attached XLSX and put the humanized field name from your list next to the corresponding field in the CRIS extract. Their may be a few fields that don’t map 1 to 1, or on present. That’s fine. Just note them and we’ll review

Case ID
Date
Time
Roadway system
Roadway part
Block number and all street info, including speed limit
Lat/Long
Unit description (this identifies the motor vehicle, pedestrian, cyclist, etc.)
Hit and Run
DL class
DL restriction
All Units - address (person)
All Units - Vehicle year, make, model
All Units - body style
All Units - injury severity
All Units - age, ethnicity, sex, restraint, alcohol/drug specimen, drug test result
Proof of liability
Contributing Factors – all which are captured in “Contributing” and “May have”
Crash severity
Injury severity
Intersection-relation type
Light condition
Railroad related flag
Road construction flag
Crash speed limit
Active School Zone
Road type
Highway system, number, and suffix for both primary and secondary
On system
School bus flag
Toll Road flag
Traffic control
Weather condition
Surface condition
DWI charge
Object struck
Helmet
Day of week
Month
Year
Collision
Unit group involved
Crash type
Fatality
PRSN_TYPE_ID

VZE: Study Locations

Within the VZE interface, Transportation Engineering users should be able to create study locations from an existing table of locations.

They should be able to sort filter and rank.

Requirements around how ranking works may need further definition.

VZE: Data QA Page

As new data is imported from CRIS, there may be data quality issues that need manual intervention before they are treated as valid data.

They could have missing field, they could be duplicates or there might be other reasons from manual data entry quality control.

How do we manage version control of data in pending states.

ETL: Create Location Records Polygon Endpoint

We need a polygon layer of locations (aka, intersections, but trying to get away from that terminology since we need mid-block geometries as well) which will serve as the immutable geometry for aggregating crash data. The data should be (ideally) hosted on AGOL so that we can integrate with REST services and not insane proprietary Esri bs.

Migrated to atd-data-tech #1996

VZV: inventory NYC pages/components

If we are able to fork the VZV.NYC code, we need to evaluate which parts of the code base are relevant to us given our available data.

For example, they include data about things like signal interventions which we aren't yet tracking.

Go through NYC VZV and identify individual components. Sets the groundwork for a discussion about utility and priority of each one for our implementation.

ETL: CRIS data

Automate the import process for CRIS data extracts. This isn't necessary to have a working system but we know APD has scripts that do this on a cron. This would replace the need to manually load data like we plan to at first in #17.

ETL: Set up Lambda Function

Once the AWS Email is set up, we will also need to set up a lambda function, preferably in python. This should take 1 day in my opinion.

ETL: CRIS: Implement Database Constraints

When working on the data provided by CRIS, I found a bunch of duplicate records in the charges table (about 300,000 records were some kind of duplicate). We will need constraints for some tables (not all) in order to have data integrity.

Set up account with TXDOT for CRIS extracts

From last email with Lewis:

For next steps, I wonder if there is value in setting up the account with TxDOT through their website and doing a sample extract for all the fields that we wanted to have and see what we’re working with. (Combined list of fields requested from ATD staff is below.) Doing this test with some of the 2016 bond top “location” data/info from Boni and starting to relate the data would be interesting as well.

Website: https://www.txdot.gov/government/enforcement/data-access.html

Guide docs: http://ftp.dot.state.tx.us/pub/txdot-info/trf/crash_statistics/automated/cris-guide.pdf

Fields:

  • Case ID
  • Date
  • Time
  • Roadway system
  • Roadway part
  • Block number and all street info, including speed limit
  • Lat/Long
  • Unit description (this identifies the motor vehicle, pedestrian, cyclist, etc.)
  • Hit and Run
  • DL class
  • DL restriction
  • All Units - address (person)
  • All Units - Vehicle year, make, model
  • All Units - body style
  • All Units - injury severity
  • All Units - age, ethnicity, sex, restraint, alcohol/drug specimen, drug test result
  • Proof of liability
  • Contributing Factors – all which are captured in “Contributing” and “May have”
  • Crash severity
  • Injury severity
  • Intersection-relation type
  • Light condition
  • Railroad related flag
  • Road construction flag
  • Crash speed limit
  • Active School Zone
  • Road type
  • Highway system, number, and suffix for both primary and secondary
  • On system
  • School bus flag
  • Toll Road flag
  • Traffic control
  • Weather condition
  • Surface condition
  • DWI charge
  • Object struck
  • Helmet
  • Day of week
  • Month
  • Year
  • Collision
  • Unit group involved
  • Crash type
  • Fatality
  • PRSN_TYPE_ID

ETL: Microstrategy integration for fallback lat/lon fields

We've learned what it looks like when APD adds CAD lat/lons to CRIS data:
https://drive.google.com/file/d/1lJTbM2ArfEUjPp-o5qsKwIcG-G5Tx6XY/view?usp=sharing

There may be the possibility of APD adding these CAD Long and CAD Lat fields to Microstrategy. Which might be a good enough reason to setup an integration to have fallback lat/lon fields when geocoding fails. Geocoding can be improved using a different service from CRIS but if there is a text input error or incomplete address data, using a completely separate lat/lon source might be necessary.

ETL: CRIS Process Extracted Data

We need to actually process the CSV files into a database and Knack. The general task will be to parse each CSV file in a way that does not take up all resources in the server and parse each CSV line into Knack and Postgres.

  • Spin up an RDS Postgres DB
  • Work on ruby script that will parse the CSV and UPSERT into Postgres:
  • When done, upload CSV files to S3 for archiving or further analysis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.