Coder Social home page Coder Social logo

thedatafox / live_twitter_sentiment_analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from emumba-com/live_twitter_sentiment_analysis

0.0 1.0 0.0 64 KB

Live Twitter sentiment analysis using Python, Apache Spark Streaming, Kafka, NLTK, SocketIO

Java 2.12% Python 4.87% HTML 0.15% JavaScript 92.85%

live_twitter_sentiment_analysis's Introduction

Scalable architecture for real-time Twitter sentiment analysis

This project implements a scalable architecture to monitor and visualize sentiment against a twitter hashtag in real-time. It streams live tweets from Twitter against a hashtag, performs sentiment analysis on each tweet, and calculates the rolling mean of sentiments. This sentiment mean is continuously sent to connected browser clients and displayed in a sparkline graph.

System design

Diagram below illustrates different components and information flow (from right to left). system design

Project breakdown

Project has three parts

1. Web server

WebServer is a python flask server. It fetches data from twitter using Tweepy. Tweets are pushed into Kafka. A sentiment analyzer picks tweets from kafka, performs sentiment analysis using NLTK and pushes the result back in Kafka. Sentiment is read by Spark Streaming server (part 3), it calculates the rolling average and writes data back in Kafka. In the final step, the web server reads the rolling mean from Kafka and sends it to connected clients via SocketIo. A html/JS client displays the live sentiment in a sparkline graph using google annotation charts.

Web server runs each independent task in a separate thread.
Thread 1: fetches data from twitter
Thread 2: performs sentiment analysis on each tweet
Thread 3: looks for rolling mean from spark streaming

All these threads can run as an independent service to provide a scalable and fault tolerant system.

2. Kafka

Kafka acts as a message broker between different modules running within the web server as well as between web server and spark streaming server. It provides a scalable and fault tolerant mechanism of communication between independently running services.

3. Calculating rolling mean of sentiments

A separate java program reads sentiment from Kafka using spark streaming, calculates the rolling average using spark window operations, and writes the results back to Kafka.

How to run

To run the project

  1. Download, setup and run Apache Kafka. I use following commands on OSX from bin dir of kafka
sh zookeeper-server-start.sh ../config/zookeeper.properties
sh kafka-server-start.sh ../config/server.properties
  1. Install complete NLTK
  2. Create a twitter app and set your keys in
    live_twitter_sentiment_analysis/webapp/tweet_ingestion/config.py
  3. Install python packages
pip install -r /live_twitter_sentiment_analysis/webapp/requirements.txt
  1. Run webserver
python live_twitter_sentiment_analysis/webapp/main.py
  1. Run the maven-java project (rolling_average) after installing maven dependencies specified in live_twitter_sentiment_analysis/rolling_average/pom.xml. Don't forget to set checkpoint dir in Main.java
  2. open the url localhost:8001/index.html

Output

Here is what final output looks like in browser

output

Note: Tested on python 2.7

live_twitter_sentiment_analysis's People

Contributors

harishasan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.