Coder Social home page Coder Social logo

popular-tweets's Introduction

Popular Tweets

Stream tweets by keyword using Kafka Producer, Kafka Stream, and Dockerize HBase instance for persistence.

Setup

Project setup - Intellij

  1. From the main menu, select File | Open.
    1. Alternatively, click Open or Import on the welcome screen.
  2. In the dialog that opens, select the pom.xml file of the project you want to open. Click OK.
  3. In the dialog that opens, click Open as Project.

IntelliJ IDEA opens and syncs the Maven project in the IDE. If you need to adjust importing options when you open the project, refer to the Maven settings.

Additional Setup

  1. Confluent Platform 5.2 or later
  2. Confluent CLI
  3. Java 1.8 or 1.11 to run Confluent Platform
    1. MacOS Java 8 installation:
      brew tap adoptopenjdk/openjdk
      brew cask install adoptopenjdk8
  4. Maven to compile the client Java code (If using Intellij -- Maven comes bundled in the IDE, so you can skip this step)
  5. Docker
  6. Apache HBase Sink Connector (writes data from a topic in Kafka to a table in the specified HBase instance)
    confluent-hub install confluentinc/kafka-connect-hbase:latest

Apply for Twitter Developer Account

Apply for a Twitter Developer Account to receive access tokens and keys to use Twitter API. When the tokens are received, keys and tokens can be generated by creating an App in the Twitter Developer dashboard.

To crate an app:
1. Apps -> Create an app -> Fill out App details form.
2. Copy key and tokens into twitter.properties file.

Running the app

  1. Create a Dockerized HBase Instance

    1. Get the Docker image:
      docker pull aaionap/hbase:1.2.0
    2. Start the HBase Docker Image
      docker run -d --name hbase --hostname hbase -p 2182:2181 -p 8080:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095 -p 16000:16000 -p 16010:16010 -p 16201:16201 -p 16301:16301 aaionap/hbase:1.2.0
    3. Add an entry 127.0.0.1 hbase to /etc/hosts.
  2. Run Kafka Producer and Kafka Stream

    sh run.sh
  3. Check HBase for data:

    1. Start HBase Shell:
      docker exec -it hbase /bin/bash entrypoint.sh
    2. Verify the table popular-tweets-avro exists. Output should be:
      TABLE
      example_table
      1 row(s) in 0.2750 seconds
      => ["popular-tweets-avro"]
    3. Verify table received data:
      scan 'popular-tweets-avro'
  4. Clean up resources

    1. Delete the connector confluent local unload hbase
    2. Stop Confluent: confluent local stop
    3. Delete Dockerized Hbase instance
         docker stop hbase
         docker rm -f hbase

Kafka Producer

The producer ingests tweets from Twitter API configured by a list of search terms.

public class TwitterProducer {
    List<String> terms = Lists.newArrayList("conspiracy", "conspiracyTheory", "fakenews");
}

All messages conform to a certain schema (class) defined in /resources/avro/Tweets.avsc before they are sent to Kafka topic in Avro format.

public Tweets(java.lang.CharSequence tweet, java.lang.CharSequence userName, java.lang.Integer userNumFollowers) {
    this.tweet = tweet;
    this.userName = userName;
    this.userNumFollowers = userNumFollowers;
  }

To create the code-generated class, compile the Java class from the Tweets.avsc file:

mvn clean compile package

Kafka Stream

The stream ingests messages from the Kafka topic and filters on user's number of followers. After filtering, the messages are persisted in a new Kafka Topic.

Dockerized Hbase Instance (Using Confluent HBase Sink Connector)

A single configuration file hbase-avro.json is used to configure which topic the Hbase instance should subscribe to (filtered kafka topic) in order to persist messages to an HBase table.

popular-tweets's People

Contributors

miafrank avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.