Coder Social home page Coder Social logo

kafka_junit_tests's Introduction

This project contains JUnit tests for tuning Kafka configurations.

What is the purpose of this project?

Apache Kafka is a distributed streaming platform. It lets you publish and subscribe to streams of data like a messaging system. You can also use it to store streams of data in a distributed cluster and process those streams in real-time. However, sometimes it can be challenging to publish or consume data at a rate that keeps up with real-time. Optimizing the speed of your producers or consumers involves knowing what specific values to use for a variety of performance related variables.

One method of tuning these parameters is to just run a series of incremental unit tests designed to measure throughput over a range of values for a single parameter. However, determining which configurations produce the best possible Kafka performance can be a time-consuming process of trial and error. Automating that process with parametrized JUnit tests is an excellent way to optimize Kafka without guess work and without wasting time.

What is JUnit?

JUnit is a unit testing framework for the Java programming language and is by far the most popular framework for developing test cases in Java.

What is in this project?

This project includes JUnit tests designed to find which Kafka configurations will maximize the speed at which messages can be published to a Kafka stream. In fact, these unit tests don't so much test anything as produce speed data so that different configurations of Kafka producers can be adjusted to get optimal performance under different conditions.

The following unit tests are included:

  1. MessageSizeSpeedTest measures producer throughput for a variety of message sizes. This test will show how much throughput declines as message sizes increase.

  2. ThreadCountSpeedTest measures producer throughput for a variety of topic quantities. This test will show how much throughput declines as the producer sends to an increasing quantity of topics.

  3. TopicCountGridSearchTest explores the effect of number of output topics, buffer size, threading and so on.

  4. TypeFormatSpeedTest measures how fast messages can be converted from POJO or JSON data format to Kafka's native byte array format. This is useful for illustrating the speed penalty you pay in Kafka serialization for using complex data types.

How do I compile and run this project?

Prerequisites

Download and run this code on a Kafka or MapR cluster.

Install a JDK and maven if you haven't already.

If you want to graph your test results, install Rscript, too.

Start Kafka and Zookeeper services.

Update bootstrap.servers in src/test/resources/producer.props to point to the Kafka service.

Compile and Run

This project has been prepared to run on either MapR or vanilla Kafka clusters.

To run it on a MapR cluster, checkout the mapr branch and run maven, like this:

git checkout mapr
mvn package

To run it on a vanilla Kafka cluster, checkout the kafka branch and run maven, like this:

git checkout kafka
mvn package

After maven completes test data should have been saved to three new files: size-count.csv, thread-count.csv, and topic-count.csv.

If you want to only run one unit test, use a command like, mvn -e -Dtest=MessageSizeSpeedTest test.

You can graph performance results like this:

Rscript src/test/R/draw-speed-graphs.r

Open the resulting .png image files to see your results. Here is an example of performance data graphed from the TopicCountGridSearchTest test:

Producer Throughput on a Kafka 3 node cluster

Caveats

Sometimes these tests require a lot of memory. You'll know when you run out of heap if you see a "queue full" exception. If that happens, edit the pom.xml and increase the JVM heap in the Xmx parameter.

Also, make sure you don't run out of disk space. In zookeeper.properties (under the config dir, where ever you installed Kafka) make sure dataDir is pointed to a drive with lots of space.

kafka_junit_tests's People

Contributors

iandow avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.