Coder Social home page Coder Social logo

kafka-spark-consumer's Introduction

README file for Kafka-Spark-Consumer
===================================

This utility will help to pull messages from Kafka Cluster using Spark Streaming.
The Kafka Consumer is Low Level Kafka Consumer ( SimpleConsumer) and have better handling of the Kafka Offsets and handle failures.

This code have implemented a Custom Receiver consumer.kafka.client.KafkaReceiver. The KafkaReceiver uses low level Kafka Consumer API (implemented in consumer.kafka packages) to fetch messages from Kafka and 'store' it in Spark.

The logic will detect number of partitions for a topic and spawn that many threads (Individual instances of Consumers).
Kafka Consumer uses Zookeeper for storing the latest offset for individual partitions, which will help to recover in case of failure .

The consumer.kafka.client.Consumer is the sample Consumer which uses this Kafka Receivers to generate DStreams from Kafka and apply a Output operation for every messages of the RDD.

We use this Kafka Spark Consumer to perform Near Real Time Indexing of Kafka Messages to target Search Cluster and also Near Real Time Aggregation using target NoSQL storage.    

Following are the instructions to build 
========================================

>git clone

>cd kafka-spark-consumer

>mvn install

Integrating Spark-Kafka-Consumer
=================================

If you want to use this Kafka-Spark-Consumer for you target client application, include below dependency in your pom.xml

                <dependency>
                        <groupId>kafka.spark.consumer</groupId>
                        <artifactId>kafka-spark-consumer</artifactId>
                        <version>0.0.1-SNAPSHOT</version>
                </dependency>

				
and use spark-kafka.properties to include below details.


spark-kafka.properties
=====================
#Kafka ZK details from where messages will be pulled

#Kafka ZK host details
zookeeper.hosts=host1,host2
#Kafka ZK Port
zookeeper.port=2181
# Kafka Broker path in ZK
zookeeper.broker.path=/brokers
#Kafka Topic to consume
kafka.topic=topic-name

#Consumer ZK Path. This will be used to store the consumed offset
zookeeper.consumer.connection=localhost:2182
#ZK Path for storing Kafka Consumer offset
zookeeper.consumer.path=/spark-kafka
# Kafka Consumer ID. This ID will be used for accessing offset details in $zookeeper.consumer.path
kafka.consumer.id=12345

# Optional RDD Processor Class.
# Target Message Processing Class. This must implements consumer.kafka.client.IIndexer
target.indexer.class=com.pearson.hbase.RDDPProcessor


Running Spark Kafka Consumer
===========================
Let assume your Kafka Message Processing logic (The implementation of consumer.kafka.client.IIndexer interface) is in custom-processor.jar which is built using the spark-kafka-consumer as dependency.

Launch this using spark-submit

./bin/spark-submit --class consumer.kafka.client.Consumer --master spark://x.x.x.x:7077 --executor-memory 5G /<Path_To>/custom-processor.jar -P /<Path_To>/spark-kafka.properties


-P external properties filename
-p properties filename from the classpath

This will start the Spark Receiver and Fetch Kafka Messages for every partition of the given topic and generates the DStream. The external IIndexer will process every messages in the given RDD. 

 

kafka-spark-consumer's People

Contributors

dibbhatt avatar

Watchers

James Cloos avatar Adam Muise avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.