Coder Social home page Coder Social logo

disaster-response-pipeline-dsnd's Introduction

Disaster Response Pipelines - Data Engineering Project

Udacity Data Scientist Nanodegree

Table of Contents

  1. Description
  2. Getting Started
    1. Dependencies
    2. Installing
    3. Executing Program
    4. Notebooks
  3. Licensing, Authors, Acknowledgements
  4. Files
  5. Screenshots

Description

This app aims to analyze messages sent during natural disasters via social media or disaster response organizations. It analyzes disaster data from Figure Eight to build a model for an API that classifies disaster messages.

It contains three modules:

  1. An ETL pipeline that processes messages and category data from csv files and load them into a SQL database;
  2. A ML pipeline that will read from the database to create and save a multi-output supervised machine learning model;
  3. A web app that will extract data from this database to provide data visualisations and use the model to classify new messages for 36 categories.

Getting Started

Dependencies

Check out requirements.txt

  • Python 3.5+
  • Data Processing Libraries: NumPy, Pandas
  • Machine Learning Library: Sciki-Learn (version 0.23.0), XGBoost
  • Natural Language Process Library: NLTK
  • SQLlite Database Libraqry: SQLalchemy
  • Model Loading and Saving Library: Pickle
  • Web App and Data Visualization: Flask, Plotly (version 2.7.0)

Installing

To clone the git repository:

git clone https://github.com/AnaHristian/disaster-response-pipeline-dsnd.git

Executing Program:

  1. You can run the following commands in the project's directory to set up the database, train model and save the model.

    • To run ETL pipeline to clean data and store the processed data in the database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
    • To run the ML pipeline that loads data from DB, trains classifier and saves the classifier as a pickle file python models/train_classifier.py data/DisasterResponse.db models/classifier.pk
  2. Run the following command in the app's directory to run your web app. python run.py

  3. Go to http://0.0.0.0:3001/

Jupyter Notebooks

In order to better understand how this app was built, I attached the two notebooks where I have created the ETL pipeline and the Machine Learning Pipeline.

  1. ETL notebook:

    • read the dataset ;
    • clean the data;
    • store it in a SQLite database.
  2. ML notebook:

    • split the data into a training set and a test set;
    • create a machine learning pipeline that uses NLTK;
    • use scikit-learn's Pipeline and GridSearchCV to output a final model that uses the message column to predict classifications for 36 categories (multi-output classification);
    • export the model to a pickle file.

Licensing, Authors, Acknowledgements

Must give credit to Figure Eight for the data. Feel free to use the code here as you would like!

License: MIT

Files

app/templates/*: templates/html files for web app

data/process_data.py: script that performs etl pipeline

models/train_classifier.py: script that performs ml pipeline

run.py: main script to run the web app

Screenshots

Here are some screenshots of the web-app:

The Distribution of Message Genres and Top Ten Message Types bar charts are created from the database and provide visualisations of the messages the model was trained with. Distribution of messages genres Top ten messages This is and example of how the model classify a message in real time. So, the model receives a message regarding a disaster and it tells the user what type of message it could be, helping the user to quickly identify it. Message classification result ETL file run example: ETL ML pipeline output ML pipeline

disaster-response-pipeline-dsnd's People

Contributors

anahristian avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.