Coder Social home page Coder Social logo

kafkaplugin's Introduction

Visualizing Apache Log Data... with StreamSets & Minecraft!

StreamSets Data Collector (SDC) is an open source framework for building continuous big data pipelines. SDC can ingest data from origins such as Apache Web Server log files, outputting to destinations such as Apache Kafka. This project is a Minecraft/Spigot plugin that consumes log data from a Kafka queue and renders it as sand blocks falling onto a map.

Screenshot

Click below to see the plugin in action:

YouTube video

NEW: I presented this demo at Hadoop Summit Dublin in April 2016:

YouTube video

Pre-requisites

Log Record Schema

Records must be written to Kafka in JSON format; the following properties are used. All are optional, and any other properties are ignored:

{
    "verb": "GET"
    "request": "/search/tag/list",
    "clientip": "65.91.212.94",
    "lat": 38.0,
    "lon": -97.0,
    "area": 1234.56
}

You can use upper or lower case for property names.

  • If lat and lon are both present, and within the map bounds (see below), a block will be rendered at height 100 over that location on the map. Note - you must use sand or gravel to get the falling effect.
  • If area is present and 10000 or more, then the block will be sand, otherwise it will be gravel.
  • If verb is present, then "{verb} {request} from {clientip}" will be sent to the Minecraft client UI as a broadcast message - e.g. GET /index.html FROM 134.23.34.56

Setup

StreamSets Configuration

Follow the Log Shipping to Elasticsearch tutorial, with the following exceptions:

  • Rather than Directory, configure a File Tail origin, with the following settings:

    Files Property Value
    Data Format Log
    File to Tail Path: /some/directory/access.log - this can be a real Apache Log file, or you can use sample data (see below)
    Naming: Active File with Reverse Counter Files
    Log Property Value
    Log Format Combined Log Format
  • After setting up the Geo IP processor - do not add the Elasticsearch destination. Instead, add a Kafka Producer destination, configuring it to publish JSON to the log topic. Use the default Multiple JSON Objects option.

Minecraft/Spigot Setup

Download Minecraft, then follow the Spigot BuildTools instructions to build and run the Spigot server.

KafkaPlugin Setup

  • Download the KafkaPlugin JAR and place it in the Spigot plugins directory.

  • Download the SLF4J JAR and put it somewhere convenient. Do NOT put it in the Spigot plugins directory, as doing so will confuse Spigot.

  • Copy the Kafka Client JAR (included with the Kafka distribution) to a convenient directory, again, NOT the Spigot plugins directory.

  • Edit the Spigot start script to include the SLF4J and Kafka Client JAR files. On Mac/Linux, it will look like this:

      #!/bin/sh
      cd "$( dirname "$0" )"
      java -Xms512M -Xmx1024M -classpath /somewhere/convenient/slf4j-api-1.7.19.jar:\
      /somewhere/convenient/kafka-clients-0.9.0.0.jar:spigot-1.9.jar \
      org.bukkit.craftbukkit.Main
    

Start Spigot; you should see the Kafka plugin being enabled:

[15:25:23 INFO]: [KafkaCraft] Enabling KafkaCraft v1.0
[15:25:23 INFO]: [KafkaCraft] Enable
[15:25:23 INFO]: [KafkaCraft] Read 80 lines of world map

Now start the Minecraft client, connect to the Spigot server (select Multiplayer, add a server at the appropriate address - 127.0.0.1 if you're running everything on the same machine), make sure you op yourself and type /kafka (@see plugin.yml for usage). You will be teleported into the sky and a world map will start to render in front of you. Rendering takes a minute or two; once the map is complete, the plugin will subscribe to the Kafka log topic and drop a sand block onto the map for each log record in the origin file.

If you want to 'play' sample data into the pipeline: in the KafkaPlugin directory use the readfile.sh script to send the sample access.log file line by line to the directory/file you specified in the File Tail origin. The script will pause for a random amount of time between 0s and 1s between each line.

./readfile.sh access.log >> /some/directory/access.log

Map File Format

Two maps are included as text files - a map of the world and one of San Francisco. The format is ASCII art, with a JSON object as a one line header. The JSON object contains the coordinates of the top left and bottom right bounds of the map. Incoming lat/lon is scaled to a block location using these coordinates. Here's the first line of sf.txt as an example:

{"topLeftLat":37.839351, "topLeftLon":-122.532321, "bottomRightLat":37.696869, "bottomRightLon":-122.317037}

Spaces in the ASCII art are rendered as water (lapis blocks), anything else is rendered as land (emerald blocks). You are, of course, free to tweak MapMaker.java to render characters as whatever you like!

I used http://www.ascii-art-generator.org/ to convert map images to ASCII art.

Have Fun!

Any problems, feel free to file an issue.

kafkaplugin's People

Contributors

angelalvarado avatar metadaddy avatar

Watchers

 avatar  avatar

Forkers

zetdyos mikygit

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.