Coder Social home page Coder Social logo

leductoanuit / kafka-topic-dumper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cobliteam/kafka-topic-dumper

0.0 0.0 0.0 118 KB

Python tool to get messages from kafka and send it to an AWS-S3 bucket in parquet format

License: MIT License

Shell 0.52% Python 98.57% Dockerfile 0.91%

kafka-topic-dumper's Introduction

kafka-topic-dumper

This is a simple tool to get data from a kafka topic and backup it into AWS-S3

Output format

Backup files will be generated in parquet format. To do so kafka-topic-dumper uses PyArrow package.

Installing

Clone this repository:

git clone [email protected]:Cobliteam/alexstrasza-stress-test.git

Go to correct folder:

cd alexstrasza-stress-test/kafka-topic-dumper

Install it using setup.py file:

pip install -e .

Configuration

To be able to upload files to AWS-S3 bucket you will need to setup your AWS Credentials as described here. Than just export you profile like here:

export AWS_PROFILE=name

Usage

$kafka-topic-dumper -h
usage: kafka-topic-dumper [-h] [-t TOPIC] [-s BOOTSTRAP_SERVERS]
                          [-b BUCKET_NAME] [-p PATH]
                          {dump,reload} ...

Simple tool to dump kafka messages and send it to AWS S3

positional arguments:
  {dump,reload}         sub-command help
    dump                Dump mode will fetch messages from kafka cluster and
                        send then to AWS-S3.
    reload              Reload mode will download files from AWS-S3 and send
                        then to kafka.

optional arguments:
  -h, --help            show this help message and exit
  -t TOPIC, --topic TOPIC
                        Kafka topic to fetch messages from.
  -s BOOTSTRAP_SERVERS, --bootstrap-servers BOOTSTRAP_SERVERS
                        host[:port] string (or list of host[:port] strings
                        concatened by ",") that the consumer should contact to
                        bootstrap initial cluster metadata. If no servers are
                        specified, will default to localhost:9092.
  -b BUCKET_NAME, --bucket-name BUCKET_NAME
                        The AWS-S3 bucket name to send dump files.
  -p PATH, --path PATH  Path to folder where to store local files.


$kafka-topic-dumper dump -h
usage: kafka-topic-dumper dump [-h] [-n NUM_MESSAGES]
                               [-m MAX_MESSAGES_PER_PACKAGE] [-d]

optional arguments:
  -h, --help            show this help message and exit
  -n NUM_MESSAGES, --num-messages NUM_MESSAGES
                        Number of messages to try dump.
  -m MAX_MESSAGES_PER_PACKAGE, --max-messages-per-package MAX_MESSAGES_PER_PACKAGE
                        Maximum number of messages per dump file.
  -d, --dry-run         In dry run mode, kafka-topic-dumper will generate
                        local files. But will not send it to AWS S3 bucket.

$kafka-topic-dumper reload -h
usage: kafka-topic-dumper reload [-h] [-g RELOAD_CONSUMER_GROUP]
                                 [-T TRANSFORMER]

optional arguments:
  -h, --help            show this help message and exit
  -g RELOAD_CONSUMER_GROUP, --reload-consumer-group RELOAD_CONSUMER_GROUP
                        Whe reloading a dump of messages that already was in
                        kafka, kafka-topic-dumper will not load it again, it
                        will only reset offsets for this consumer-group.
  -T TRANSFORMER, --transformer TRANSFORMER
                        package:class that will be used to transform each
                        message before producing

Basic example

The following command will dump to the folder named data, 10000 messages from the kafka server hosted at localhost: 9092. It will also send this dump in 1000 message packets to an AWS-S3 bucket called my-bucket.

$kafka-topic-dumper -t my-topic -s localhost:9092 -n 10000 -m 1000 -p ./data \
    -b my-bucket

kafka-topic-dumper's People

Contributors

evnsan avatar gustavo-momente avatar stoiev avatar nicolautahan avatar danielkza avatar festefanini avatar

Stargazers

Roman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.