Coder Social home page Coder Social logo

sfbrigade / datasci-firerisk Goto Github PK

View Code? Open in Web Editor NEW
10.0 33.0 9.0 27.03 MB

This project attempts to model and acquire data from SF OpenData - and other sources - to predict the relative risk of fire in San Francisco’s buildings and public spaces.

Home Page: http://codeforsanfrancisco.org/projects/SF-Fire-Risk-Project

JavaScript 0.41% Python 1.75% HTML 29.11% R 0.05% Jupyter Notebook 68.67%
data-science predictive-modeling machine-learning civic-tech civic-hacking

datasci-firerisk's Introduction

San Francisco Fire Risk Project

This project is a part of the Data Science Working Group at Code for San Francisco. Other DSWG projects can be found at the main GitHub repo.

Project Status: Active

Current Model's Average F1 Score: 0.67

Project Intro/Objective

This project attempts to model and acquire data from SF OpenData - and other sources - to predict the relative risk of fire in San Francisco’s buildings and public spaces.

Methods Used

  • Data Science
  • Machine Learning
  • Data Visualization
  • Predictive Modeling
  • Data Analysis

Technologies

  • Python
    • NumPy / pandas
    • Scikit-learn
    • matplotlib
  • R
    • dplyr / tidyr
    • ggplot2
  • Jupyter Notebook

Project Description

The mapping software will allow the user to type in an address and see fire-related risks and incidences around their area, as well as provide recommendations by fire safety experts in cases where there may be a high enough score to warrant preventive actions. This project is modeled after Data Science for Social Good's (DSSG) Firebird Project in Atlanta, GA. Consultation is occasionally provided by members of the DSSG and former members of the Atlanta project.

Needs of this project

  1. data scientists
  2. data analysts
  3. data visualizers
  4. researchers and journalists (find trends and data related to fire incidents in the city)
  5. product and project managers

Getting Started

  1. Clone this repo. For help, see tutorial
  2. Download data stored in the project Google Drive.
  3. Review Project Wiki.
  4. Check Issue Tracker and discuss with team members to understand current project needs.
  5. Hack away! :)

Additional Documentation

Our workflow chart is here: SF Fire Risk Workflow

SF Fire Risk Attribute Sheet SF Fire Risk Predictive Model Attribute Sheet (Always adding more!)

Featured Notebooks/Analysis

May 2017 Presentation

CartoDB Visual Mock-Up

Contributing DSWG Members

Name Slack Handle Role Website
Ryan Tanaka @ryangtanaka Product Manager, Team Lead http://product.ryangtanka.com
Kel Yip @yamariva2000 Data Scientist, Technical Lead
Seward Lee @sewardlee337 Data Scientist http://www.sewardlee.com
Kenny Durell @kennyd Data Scientist
Sam Williams @swilliams2099 Research, Outreach
Kevin Stahler @stahlerk Data Scientist
Yuzhe Chen @yzxchn Data Scientist
Chris Quiambao @ccquiambao Data Scientist https://github.com/ccquiambao
Hannah Gorman @hannahrosey Data Scientist, Visualization

Contact

  • If you haven't joined the SF Brigade Slack, you can do that here.
  • Our Slack channel is #datasci-firerisk
  • Feel free to contact team leads with any questions or if you are interested in contributing! Most of our activities are done in our Slack channel.

datasci-firerisk's People

Contributors

andirs avatar ccquiambao avatar hannahrosey avatar kennydurell avatar rscummings avatar ryangtanaka-org avatar sewardlee337 avatar smoningi avatar stahlerk avatar yamariva2000 avatar yzxchn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasci-firerisk's Issues

Generate ETL script for population of data base

  • Generate dummy csv files (EAS, Incident, Property, Prediction) for processing
  • Define process of data population
  • Adapt sba script for our use case
  • Propose updating scheme (schedule) depending on time of operations and frequency of data source updates

Census Data

  • Number of housing units per block group
  • Clarify blocks definition/population per block
  • Income level

Longitude/Latitude?

Check-in with James on this.

Connecting EAS_IDs to tax blocks

The EAS_IDs represent street addresses at the building level, whereas tax block parcels may contain multiple EAS and conversely, single EAS may contain multiple tax block parcels.

Current data pipeline describes only most recent fire
The tax lot and EAS_IDs have a many-to-many relationship.

Required features
square footage
property value

Update fire_incident_master_20170823

  1. Merge each specific fire incident to EAS
  2. Merge all relevant features from tax roll data
  3. Create basic transformations/combinations of relevant features (Ex. "land value per square foot")
  4. Remove any outliers
  5. Aggregate building type into broader categories

Model Runs

This is a temporary place to record results of model runs.

Please record the following:

  • Version of model you are using. (Which commit is it stored in? Link to commit or note SHA-1 hash)
  • Changes/modifications you made to model. (Changes to features used, changes to other model parameters, etc.)
  • Results

Prepare feature data from 'matched_Fire_Safety_Complaints.csv'

  1. Subset data to potentially useful features
  2. Detect and remove outliers
  3. Consider dropping complaints with 'No merit' in the 'Disposition' column
  4. Consider organizing "Complaint Item Type Description" column into more generalized groups (if appropriate)
  5. Collapse data at EAS level
  6. Create any potentially relevant features (for example, total number of complaints b/w 2005-2016 associated with EAS, etc.)
  7. Any other data cleaning and standardization operations
  8. Output as .csv (indexed at EAS)

Create feature data set from 'matched_Fire_Inspections.csv'

  1. Subset data to potentially useful features
  2. Detect and remove outliers
  3. Consider organizing "Inspection Type Description" column into more generalized groups (if appropriate)
  4. Collapse data at EAS level
  5. Create any potentially relevant features (for example, total number of fire inspections b/w 2005-2016 associated with EAS, etc.)
  6. Any other data cleaning and standardization operations
  7. Output as .csv (indexed at EAS)

Acquire data from Core Logic

Currently seeking high-quality data related to fire and other natural hazards from Core Logic.

This dataset will have more details than what we can get publicly, including:

  • Fire hydrant location
  • Distance from fire station in relation to coverage area
  • Building construction materials (??)

Proposed (max) budget for proprietary datasets: $2,000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.