Coder Social home page Coder Social logo

streamnative / pulsar-flume-ng-sink Goto Github PK

View Code? Open in Web Editor NEW
20.0 25.0 14.0 34 KB

An Apache Flume Sink implementation to publish data to Apache pulsar

License: Apache License 2.0

Java 89.44% Python 5.80% Shell 4.75%
apache-flume apache-pulsar pubsub messaging logcollector

pulsar-flume-ng-sink's Introduction

Flume Ng Pulsar Sink

License Build Status

This is a Flume Sink implementation that can publish data to a Pulsar topic

Compatibility

This sink is developed and tested using Apache Flume NG 1.9.0 and Apache Pulsar Client 2.3.0.

Requirements

Clone the project

$ git clone https://github.com/streamnative/flume-ng-pulsar-sink.git

Start Pulsar Standalone

docker pull apachepulsar/pulsar:2.3.0
docker run -d -it -p 6650:6650 -p 8080:8080 -v $PWD/data:/pulsar/data --name pulsar-flume-standalone apachepulsar/pulsar:2.3.0 bin/pulsar standalone

Start Pulsar Consumer

Start a consumer to consume messages from topic flume-test-topic.

docker cp src/test/python/pulsar-flume.py pulsar-flume-standalone:/pulsar
docker exec -it pulsar-flume-standalone /bin/bash
python pulsar-flume.py

Setup up Flume

Prepare Build Environment

Open a new terminal to start a docker instance flume of maven:3.6-jdk-8 in the same network as pulsar-flume-standalone we started at previous step. We will use this flume docker instace to install Flume and Flume-Ng-Pulsar-Sink.

docker pull maven:3.6-jdk-8
docker run -d -it --link pulsar-flume-standalone -p 44445:44445 --name flume maven:3.6-jdk-8 /bin/bash

Install Flume

Go to the docker instance flume

docker exec -it flume /bin/bash

At flume instance:

wget http://apache.01link.hk/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
tar -zxvf apache-flume-1.9.0-bin.tar.gz

Install Pulsar Sink

At flume instance:

git clone https://github.com/streamnative/flume-ng-pulsar-sink
cd flume-ng-pulsar-sink
mvn clean package
cd ..
cp flume-ng-pulsar-sink/target/flume-ng-pulsar-sink-1.9.0.jar apache-flume-1.9.0-bin/lib/
exit

Configure Flume

Copy the example configurations to flume:

docker cp src/test/resources/flume-example.conf flume:/apache-flume-1.9.0-bin/conf/
docker cp src/test/resources/flume-env.sh flume:/apache-flume-1.9.0-bin/conf/

Start Flume Ng Agent

docker exec -it flume /bin/bash

At flume instance:

apache-flume-1.9.0-bin/bin/flume-ng agent --conf apache-flume-1.9.0-bin/conf/ -f apache-flume-1.9.0-bin/conf/flume-example.conf -n a1

Send Data

Open another terminal, send data to port 44445 of flume

โžœ  ~ telnet localhost 44445
Trying ::1...
Connected to localhost.
Escape character is '^]'.
hello
OK
world
OK

At the terminal running pulsar-consumer.py, you will see following output:

'eceived message: 'hello
'eceived message: 'world

Cleanup

flume and pulsar-flume-standalone are running at background. Please remember to kill them at the end of this tutorial.

$ docker ps | grep pulsar-flume-standalone | awk '{ print $1 }' | xargs docker kill
$ docker ps | grep flume | awk '{ print $1 }' | xargs docker kill

Installation

Requirements

  • JDK 1.8+
  • Apache Maven 3.x

Build from Source

Clone the project from Github:

$ git clone https://github.com/streamnative/flume-ng-pulsar-sink.git

Building the Flume Ng Sink using maven:

$ cd flume-ng-pulsar-sink
$ mvn clean package

Once it is built successfully, you will find a jar flume-ng-pulsar-sink-<version>.jar generated under target directory. You can drop the built jar at your flume installation under lib directory.

Usage

Configurations

Name Description Default
useAvroEventFormat Whether use avro format for event false
syncMode Mode of send data to pulsar true

Client

Name Description Default
serviceUrl Whether non-persistent topics are enabled on the broker localhost:6650
authPluginClassName name of the Authentication-Plugin you want to use ""
authParamsString string which represents parameters for the Authentication-Plugin, e.g., "key1:val1,key2:val2" ""
tlsCertFile path of tls cert file ""
tlsKeyFile path of tls key file ""
useTLS Whether to turn on TLS, if to start, use protocol pulsar+ssl false
operationTimeout Set the operation timeout (default: 30 seconds) 30s
numIoThreads Set the number of threads to be used for handling connections to brokers 1
numListenerThreads Set the number of threads to be used for message listeners 1
connectionsPerBroker Sets the max number of connection that the client library will open to a single broker. 1
enableTcpNoDelay Configure whether to use TCP no-delay flag on the connection, to disable Nagle algorithm. false
tlsTrustCertsFilePath Set the path to the trusted TLS certificate file false
allowTlsInsecureConnection Configure whether the Pulsar client accept untrusted TLS certificate from broker false
enableTlsHostnameVerification whether to enable TLS hostname verification false
statsInterval the interval between each stat info 60
maxConcurrentLookupRequests Number of concurrent lookup-requests allowed to send on each broker-connection to prevent overload on broker. 60
maxLookupRequests Number of max lookup-requests allowed on each broker-connection to prevent overload on broker. 60
maxNumberOfRejectedRequestPerConnection Set max number of broker-rejected requests in a certain time-frame (30 seconds) after which current connection will be closed and client creates a new connection that give chance to connect a different broker 50
keepAliveIntervalSeconds Set keep alive interval in seconds for each client-broker-connection. 30
connectionTimeout Set the duration of time to wait for a connection to a broker to be established. 30

Producer

Name Description Default
topicName Specify the topic this producer will be publishing on. ""
producerName Specify a name for the producer ""
sendTimeout Set the send timeout 30s
blockIfQueueFull Set whether the send and sendAsync operations should block when the outgoing message queue is full. false
enableBatching Control whether automatic batching of messages is enabled for the producer true
batchMessagesMaxMessagesPerBatch maximum number of messages in a batch 1000
batchDelay the batch delay 1ms
messageRoutingMode the message routing mode, SinglePartition,RoundRobinPartition, CustomPartition(0,1,2) 1
hashingSchema JavaStringHash,Murmur3_32Hash(0,1) 0
compressionType NONE,LZ4,ZLIB,ZSTD(0,1,2,3) 0

License

This project is licensed under the Apache License 2.0.

FOSSA Status

pulsar-flume-ng-sink's People

Contributors

se7enkings avatar sijie avatar tuteng avatar wangyunpeng666 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pulsar-flume-ng-sink's Issues

[Feature]Add a sink to synchronize pulsar topic data to flume

At present, some problems of connectors in pulsar main repository are mainly inconvenient to deploy, which requires function-worker to run. For a large amount of log collection, it is unrealistic to deploy function-worker on each machine.
Some users use flume to collect logs and transmit them to the message queue. After consuming data from the message queue, the logs are transmitted to the flume and then transmitted to other systems, such as elasticsearch. In order to be compatible with this design, we need to add a pulsar sink to synchronize the data in pulsar to Flume.
The implementation principle starts a consumer, consumes data from the topic, and then synchronizes the consumed data to the flume

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.