Coder Social home page Coder Social logo

hritwiksinghal / spark-tweet Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 6.0 55.45 MB

This project will fetch recent tweets based on “keywords” using Twitter API v2, filter hashtags from those tweets and give them to Apache Spark streaming for processing. After that it will launch a flask web server on localhost:5001 to view the data in a visual dashboard powered by ApexCharts.

License: GNU General Public License v3.0

Python 64.49% HTML 31.18% Shell 2.15% CSS 2.19%
spark python twitter tweet analysis pyspark hashtags flask css html

spark-tweet's Introduction

Real Time Analysis of Twitter hashtags using Apache Spark Structured Streaming


This project will fetch recent tweets based on “keywords” using Twitter API v2, filter hashtags from those tweets and give them to Apache Spark streaming for processing. After that it will launch a flask web server on localhost:5001 to view the data in a visual dashboard powered by ApexCharts.


Introduction

We are Using Apache Spark streaming, Real-Time Analytics engine, to process tweets retrieved from Twitter API and identify the trending hashtags from them based on a certain keywords and, finally, represent the data in a real-time dashboard using flask web framework.

Limitations

  • 450 queries per 15 minutes (enforced by twitter APIv2) . see here
  • 500K queries per month(enforced by twitter APIv2) . see here
  • We cannot get general tweets from Twitter. We have to get tweets based on some keywords (enforced by twitter APIv2)

Getting API keys from twitter.

The dataset used for this project is Twitter tweets. So, to get the Twitter tweets, we need access to Twitter API.

  • Go to the developer portal dashboard
  • Sign in with your developer account
  • Create a new project, give it a name, a use-case based on the goal you want to achieve, and a description.
  • choose ‘create a new App instead’ and give your App a name in order to create a new App
  • If everything is successful, you should be able to see page containing your keys and tokens, we will use Bearer token to access the API.
  • Make a new file keys.txt and in it put the bearer token in below format.
    token:<your_token_here>
    Make sure there are no spaces between token & : and : & <your_token>

Working of the project:

  • First, We retrieve tweets from Twitter using the Twitter APIv2.
  • The tweets are based on keywords that user specifies. (see running the app section)
  • The data is processed with the pyspark and hashtags are separated from tweets.
  • Then we send tweets through a TCP Socket to spark.
  • Using Apache spark, we process those trending hashtags.
  • To display the data in a visual representation, we are using flask web app.

Running the Application

First steps...

  • Java version should be compatible with pyspark. Current version of pyspark is 3.2.0 and only java version 11 is compatible. You can check java version by running command java --version. Make sure to have only compatible java version installed.
  • git clone https://github.com/HritwikSinghal/Spark-tweet.git
  • cd Spark-tweet
  • pip install -r ./requirements.txt

Now...

1. Automatic run

Simply run run.sh. if you want the defaults. The defaults are :

  • keywords = "corona bitcoin gaming Android climate cricket"
  • pages = 15 (per keyword)

Note that this will open the browser window and will kill the app after 4 minutes. (this will not happen if you use manual run, although you can modify run.sh to change this behaviour)

2. Manual run

Run the Programs in the order. NOTE: Every step should be run in new terminal

  1. Flask Application python3 ./app.py

  2. python3 ./twitter_app.py -p _<no_of_pages>_ -k _<"keywords">_

Replace _<"keywords">_ with the keywords you want to search (Note that keywords should be in quotes, like "corona bitcoin gaming Android") and <no_of_pages> with the number of pages you want for each keyword from twitter.

  1.   export PYSPARK_PYTHON=python3
      export SPARK_LOCAL_HOSTNAME=localhost
      python3 ./spark_app.py
    

Visual representation

You can access the real-time data in visual representation by accessing this URL given below.

http://localhost:5001/ 

or

http://127.0.0.1/5001

Stopping the application

run killall python3 in new terminal


Final Output

Demo


spark-tweet's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.