Coder Social home page Coder Social logo

disaster_response_app's Introduction

Table of Contents

  1. Introduction
  2. File Description
  3. Heroku Web App
  4. Licensing, Authors, and Acknowledgements
  5. Other resources

1. Introduction

This project is an analysis of disaster data from Figure Eight to build a model for an API that classifies disaster messages.

The data set contains real messages that were sent during disaster events. A machine learning pipeline is created to categorize these events into 36 categories, so that the messages can be sent to an appropriate disaster relief agency.

The machine learning pipeliine includes natural language processing (text processing), feature extraction, modeling, and Flask web app development. In the web app, one can input a new message and get classification results. The web app also displays visualizations of the training dataset.

2. File Description

Raw datasets and data processing

There are two raw datasets used for model training and testing: "./data/disaster_messages.csv" contains raw text messages, and "./data/disaster_categories.csv" contains response categories for each piece of text message in the previous .csv file.

"./data/process_data.py" is the text data processing pipeline used in this step. The pipeline includes:

  • Loading and merging the original two .cvs datasets
  • Cleaning the merged dataset
  • Saving the processed dataset into a database named "DisasterResponse.db" for later use in Machine Learning pipeline

"./data/DisasterResponse.db" is the processed database saved after the data processing ETL pipeline.

Machine learning pipeline

With the cleaned dataset, a machine learning pipeline is created to train a classification model, such that future text input could be processed and classified.

"./models/train_classifier.py" is the machine learning pipeline used in this step. The pipeline includes:

  • Loading data from the database generated in the previous data processing step, and spliting the data into training and testing sets
  • Building a Gradient Boosting Classification model with sklearn package, and tuning the model with GridSearchCV package
  • Evaluating the model based on prediction precision, recall, and f1-score
  • Saving the trained and tuned model into a pickle file named "classfier.pkl"

A separate file named "utils.py" can be found in folder "utility," where the tokenize function is saved and imported from in the pipeline.

Deployment

To install the flask app locally on your computer, you need:

Install the packages with

pip install -r requirements.txt

Run the following command in the app's directory "./application" to run the web app.

python run.py

3. Heroku Web App

One may access the web app here. It might take a while to load the page.

To use the app, enter a piece of message (in English) in the text box, and click the "Classify Message" button. The app will automatically process text data and return relevant categories.

4. Licensing, Authors, and Acknowledgements

Must give credits to:

  • Figure Eight, who kindly provides the raw datasets, and
  • Udacity, who guides through this natural language processing project

Also give credits to Rajat S., a mentor of the Udacity Data Science Nanodegree program, helped solve issues in the deployment process.

5. Other resources

More details of the process of creating the two pipelines (data processing and machine learning) can be found here in two Jupyter notebooks "ETL Pipeline Preparation.ipynb" and "ML Pipeline Preparation.ipynb." This github repository includes files to generate the same disaster response app locally.

disaster_response_app's People

Contributors

sheilaxz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.