Coder Social home page Coder Social logo

drisskhattabi6 / real-time-twitter-sentiment-analysis Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 1.0 5.21 MB

This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".

Python 23.88% Jupyter Notebook 59.54% CSS 7.65% HTML 8.93%
big-data big-data-projects django django-dashboard docker kafka kafka-producer mongodb pyspark real-time-processing

real-time-twitter-sentiment-analysis's Introduction

Big Data Project: Real-Time Twitter Sentiment Analysis Using Kafka, Spark (MLLib & Streaming), MongoDB and Django.

Overview

This repository contains a Big Data project focused on real-time sentiment analysis of Twitter data (classification of tweets). The project leverages various technologies to collect, process, analyze, and visualize sentiment data from tweets in real-time.

Project Architecture

The project is built using the following components:

  • Apache Kafka: Used for real-time data ingestion from Twitter DataSet.

  • Spark Streaming: Processes the streaming data from Kafka to perform sentiment analysis.

  • MongoDB: Stores the processed sentiment data.

  • Django: Serves as the web framework for building a real-time dashboard to visualize the sentiment analysis results.

  • chart.js & matplotlib : for plotting.

  • This is the project plan : project img

Features

  • Real-time Data Ingestion: Collects live tweets using Kafka from the Twitter DataSet.
  • Stream Processing: Utilizes Spark Streaming to process and analyze the data in real-time.
  • Sentiment Analysis: Classifies tweets into different sentiment categories (positive, negative, neutral) using natural language processing (NLP) techniques.
  • Data Storage: Stores the sentiment analysis results in MongoDB for persistence.
  • Visualization: Provides a real-time dashboard built with Django to visualize the sentiment trends and insights.

Data description:

In This Project I'm using a Dataset (twitter_training.csv and twitter_validation.csv) to create pyspark Model and for create live tweets using Kafka. Each line of the "twitter_training.csv" learning database represents a Tweet, it contains over 74682 lines;

The data types of Features are:

  • Tweet ID: int
  • Entity: string
  • Sentiment: string (Target)
  • Tweet content: string

The validation database “twitter_validation.csv” contains 998 lines (Tweets) with the same features of “twitter_training.csv”.

This is the Data Source: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis

Repository Structure

  • Django-Dashboard : this folder contains Dashboard Django Application
  • Kafka-PySpark : this folder contains kafka provider and pyspark streaming (kafka consumer).
  • ML PySpark Model : this folder contains the trained model with jupyter notebook and datasets.
  • zk-single-kafka-single.yml : Download and install Apache Kafka in docker.
  • bigdataproject rapport : a brief report about the project (in french).

Getting Started

Prerequisites

To run this project, you will need the following installed on your system:

  • Docker (for runing Kafka)
  • Python 3.x
  • Apache Kafka
  • Apache Spark (PySpark for python)
  • MongoDB
  • Django

Installation

  1. Clone the repository:

    git clone https://github.com/drisskhattabi6/Real-Time-Twitter-Sentiment-Analysis.git
    cd Real-Time-Twitter-Sentiment-Analysis
  2. Installing Docker Desktop

  3. Set up Kafka:

    • Download and install Apache Kafka in docker using :
    docker-compose -f zk-single-kafka-single.yml up -d
  4. Set up MongoDB:

    • Download and install MongoDB.
      • It is recommended to install also MongoDBCompass to visualize data and makes working with mongodb easier.
  5. Install Python dependencies:

    • To install pySpark - PyMongo - Django ...
    pip install -r requirements.txt

Running the Project

Note : you will need MongoDB for Running the Kafka and Spark Streaming application and for Running Django Dashboard application.

  • Start MongoDB:
    • using command line :
    sudo systemctl start mongod
    • then use MongoDBCompass (Recommended).

Running the Kafka and Spark Streaming application :

  1. Change the directory to the application:

    cd Kafka-PySpark
  2. Start Kafka in docker:

    • using command line :
    docker exec -it <kafka-container-id> /bin/bash
    • or using docker desktop :

       docker desktop img

  3. Run kafka Zookeeper and a Broker:

    kafka-topics --create --topic twitter --bootstrap-server localhost:9092
    kafka-topics --describe --topic twitter --bootstrap-server localhost:9092
  4. Run kafka provider app:

    py producer-validation-tweets.py
  5. Run pyspark streaming (kafka consumer) app:

    py consumer-pyspark.py

Running the Kafka and Spark Streaming application img

this is an img of the MongoDBCompass after Running the Kafka and Spark Streaming application :

MongoDBCompass img

Running Django Dashboard application :

  1. Change the directory to the application:

    cd Django-Dashboard
  2. Creating static folder:

    python manage.py collectstatic
  3. Run the Django server:

    python manage.py runserver
  4. Access the Dashboard: Open your web browser and go to http://127.0.0.1:8000 to view the real-time sentiment analysis dashboard.

the Dashboard

Running the Dashboard

More informations :

  • Django Dashboard get the data from MongoDb DataBase.
  • the User can classify his owne text in http://127.0.0.1:8000/classify link.
  • in the Dashboard, There is a table contains tweets with labels.
  • in the Dashboard, There is 3 statistics or plots : labels rates - pie plot - bar plot.

Team :

Supervised By :

  • Prof. Yasyn El Yusufi

Abdelmalek Essaadi University - Faculty of Sciences and Technology of Tangier

  • Master: Artificial Intelligence and Data Science
  • Module: Big Data

  • By following the above instructions, you should be able to set up and run the real-time Twitter sentiment analysis project on your local machine. Happy coding!

  • Feel free to explore the project and customize it according to your requirements. If you encounter any issues or have any questions, don't hesitate to reach out!

real-time-twitter-sentiment-analysis's People

Contributors

aymanboufarhi avatar drisskhattabi6 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

shivambhadula

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.