Popular Tweets

Stream tweets by keyword using Kafka Producer, Kafka Stream, and Dockerize HBase instance for persistence.

Setup

Project setup - Intellij

From the main menu, select File | Open.
1. Alternatively, click Open or Import on the welcome screen.
In the dialog that opens, select the pom.xml file of the project you want to open. Click OK.
In the dialog that opens, click Open as Project.

IntelliJ IDEA opens and syncs the Maven project in the IDE. If you need to adjust importing options when you open the project, refer to the Maven settings.

Additional Setup

Confluent Platform 5.2 or later
Confluent CLI
Java 1.8 or 1.11 to run Confluent Platform
1. MacOS Java 8 installation:
```
brew tap adoptopenjdk/openjdk
brew cask install adoptopenjdk8
```
Maven to compile the client Java code (If using Intellij -- Maven comes bundled in the IDE, so you can skip this step)
Docker
Apache HBase Sink Connector (writes data from a topic in Kafka to a table in the specified HBase instance)
```
confluent-hub install confluentinc/kafka-connect-hbase:latest
```

Apply for Twitter Developer Account

Apply for a Twitter Developer Account to receive access tokens and keys to use Twitter API. When the tokens are received, keys and tokens can be generated by creating an App in the Twitter Developer dashboard.

To crate an app:
1. Apps -> Create an app -> Fill out App details form.
2. Copy key and tokens into twitter.properties file.

Running the app

Create a Dockerized HBase Instance

Get the Docker image:
```
docker pull aaionap/hbase:1.2.0
```

Start the HBase Docker Image

docker run -d --name hbase --hostname hbase -p 2182:2181 -p 8080:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095 -p 16000:16000 -p 16010:16010 -p 16201:16201 -p 16301:16301 aaionap/hbase:1.2.0

Add an entry 127.0.0.1 hbase to /etc/hosts.

Run Kafka Producer and Kafka Stream
```
sh run.sh
```

Check HBase for data:

Start HBase Shell:

docker exec -it hbase /bin/bash entrypoint.sh

Verify the table popular-tweets-avro exists. Output should be:

TABLE
example_table
1 row(s) in 0.2750 seconds
=> ["popular-tweets-avro"]

Verify table received data:
```
scan 'popular-tweets-avro'
```

Clean up resources
1. Delete the connector confluent local unload hbase
2. Stop Confluent: confluent local stop
3. Delete Dockerized Hbase instance
```
   docker stop hbase
   docker rm -f hbase
```

Kafka Producer

The producer ingests tweets from Twitter API configured by a list of search terms.

public class TwitterProducer {
    List<String> terms = Lists.newArrayList("conspiracy", "conspiracyTheory", "fakenews");
}

All messages conform to a certain schema (class) defined in /resources/avro/Tweets.avsc before they are sent to Kafka topic in Avro format.

public Tweets(java.lang.CharSequence tweet, java.lang.CharSequence userName, java.lang.Integer userNumFollowers) {
    this.tweet = tweet;
    this.userName = userName;
    this.userNumFollowers = userNumFollowers;
  }

To create the code-generated class, compile the Java class from the Tweets.avsc file:

mvn clean compile package

Kafka Stream

The stream ingests messages from the Kafka topic and filters on user's number of followers. After filtering, the messages are persisted in a new Kafka Topic.

Dockerized Hbase Instance (Using Confluent HBase Sink Connector)

A single configuration file hbase-avro.json is used to configure which topic the Hbase instance should subscribe to (filtered kafka topic) in order to persist messages to an HBase table.

miafrank / popular-tweets Goto Github PK

popular-tweets's Introduction

Popular Tweets

Setup

Project setup - Intellij

Additional Setup

Apply for Twitter Developer Account

Running the app

Kafka Producer

Kafka Stream

Dockerized Hbase Instance (Using Confluent HBase Sink Connector)

popular-tweets's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent