Coder Social home page Coder Social logo

mkasigwa / udacity-disaster-response-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from canaveensetia/udacity-disaster-response-pipeline

0.0 0.0 0.0 13.98 MB

This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweet and messages from real-life disaster events. The project aim is to build a Natural Language Processing (NLP) model to categorize messages on a real time basis.

License: MIT License

Python 12.16% HTML 4.66% Jupyter Notebook 83.18%

udacity-disaster-response-pipeline's Introduction

Disaster Response Pipeline Project (Udacity - Data Science Nanodegree)

Intro Pic

Table of Contents

  1. Description
  2. Getting Started
    1. Dependencies
    2. Installing
    3. Executing Program
    4. Additional Material
  3. Authors
  4. License
  5. Acknowledgement
  6. Screenshots

Description

This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The dataset contains pre-labelled tweet and messages from real-life disaster events. The project aim is to build a Natural Language Processing (NLP) model to categorize messages on a real time basis.

This project is divided in the following key sections:

  1. Processing data, building an ETL pipeline to extract data from source, clean the data and save them in a SQLite DB
  2. Build a machine learning pipeline to train the which can classify text message in various categories
  3. Run a web app which can show model results in real time

Getting Started

Dependencies

  • Python 3.5+
  • Machine Learning Libraries: NumPy, SciPy, Pandas, Sciki-Learn
  • Natural Language Process Libraries: NLTK
  • SQLlite Database Libraqries: SQLalchemy
  • Model Loading and Saving Library: Pickle
  • Web App and Data Visualization: Flask, Plotly

Installing

To clone the git repository:

git clone https://github.com/canaveensetia/udacity-disaster-response-pipeline.git

Executing Program:

  1. You can run the following commands in the project's directory to set up the database, train model and save the model.

    • To run ETL pipeline to clean data and store the processed data in the database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/disaster_response_db.db
    • To run the ML pipeline that loads data from DB, trains classifier and saves the classifier as a pickle file python models/train_classifier.py data/disaster_response_db.db models/classifier.pkl
  2. Run the following command in the app's directory to run your web app. python run.py

  3. Go to http://0.0.0.0:3001/

Additional Material

In the data and models folder you can find two jupyter notebook that will help you understand how the model works step by step:

  1. ETL Preparation Notebook: learn everything about the implemented ETL pipeline
  2. ML Pipeline Preparation Notebook: look at the Machine Learning Pipeline developed with NLTK and Scikit-Learn

You can use ML Pipeline Preparation Notebook to re-train the model or tune it through a dedicated Grid Search section.

Important Files

app/templates/*: templates/html files for web app

data/process_data.py: Extract Train Load (ETL) pipeline used for data cleaning, feature extraction, and storing data in a SQLite database

models/train_classifier.py: A machine learning pipeline that loads data, trains a model, and saves the trained model as a .pkl file for later use

run.py: This file can be used to launch the Flask web app used to classify disaster messages

Authors

License

License: MIT

Acknowledgements

  • Udacity for providing an amazing Data Science Nanodegree Program
  • Figure Eight for providing the relevant dataset to train the model

Screenshots

  1. This is an example of a message we can type to test the performance of the model

Sample Input

  1. After clicking Classify Message, we can see the categories which the message belongs to highlighted in green

Sample Output

  1. The main page shows some graphs about training dataset, provided by Figure Eight

Main Page

  1. Sample run of process_data.py

Process Data

  1. Sample run of train_classifier.py

Train Classifier without Category Level Precision Recall

  1. Sample run of train_classifier.py with precision, recall etc. for each category

Train Classifier with Category Level Precision Recall

udacity-disaster-response-pipeline's People

Contributors

naveensetia2019 avatar canaveensetia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.