Coder Social home page Coder Social logo

Comments (3)

btoddb avatar btoddb commented on June 28, 2024

if you already have the fields broken out using an interceptor, i think you
have what you need. does the interceptor put the fields back into the
payload of the LogEvent or use attributes on the log event?

the sink is fairly basic, so it does need some work to be used in
production. my thoughts on this have always been around using an
interceptor that takes the payload and converts the payload into standard
JSON (or some other) such that any sink would be able to interpret the JSON
to do as it needs. for cassandra sink it could parse the JSON and store
each JSON property as a column.

if you have an interceptor that does this, i'll take a pull request and
include in the code. i will probably be working on this again soon so your
question is timely :)

thx!

On Thu, Jan 17, 2013 at 9:36 AM, jeffb4 [email protected] wrote:

I'm writing Apache webserver logs to Cassandra using Flume and this Sink,
and I would like to break log entries into various fields/columns (I
already break in to fields with an interceptor).

Would the best/canonical method of doing this be to extend
flume-ng-cassandra-sink with a serializer config directive, default said
directive to the existing serializer, and then (for my needs) create a
custom serializer that takes desired fields as a configuration option, and
stuffs them into Cassandra as columns?


Reply to this email directly or view it on GitHubhttps://github.com//issues/3.

from flume-ng-cassandra-sink.

jeffb4 avatar jeffb4 commented on June 28, 2024

The interceptor (default regex_extractor that comes with Flume) serializes the parsed-out data into event headers - I'm not familiar enough with Flume terminology to say whether that is LogEvent payload or attribute.

My thought (instead of the JSON conversion and then deconversion) was something like:

host1.sinks.sink1.type = com.btoddb.flume.sinks.cassandra.CassandraSink
# default Cassandra Serializer (no Flume header magic)
# host1.sinks.sink1.serializer = com.btoddb.flume.sinks.cassandra.SimpleCassandraEventSerializer
# custom Cassandra Serializer (insert Flume header fields as columns)
host1.sinks.sink1.serializer = com.blah.ComplexCassandraEventSerializer

# map Flume headers "foo", "bar", "alpha", and "beta" to Cassandra columns of the same name
host1.sinks.sink1.serializer.fieldcolumns = foo bar alpha beta

As far as your plugin goes, the big difference would be the addition of the .serializer config option (defaulting to your current use of the ByteBufferSerializer out of Hector).

If JSON/BSON was being written to more than MongoDB, or if Flume event headers weren't capable of storing columns, I could see a more generic JSON solution for in-flight data.

from flume-ng-cassandra-sink.

btoddb avatar btoddb commented on June 28, 2024

thanks for pinging me. i had some other things ahead of this, and wanted to understand a bit more (been a while since i've hit the code).

yes i think you're on to something, but instead of using the regex, maybe just supply a serializer that does the regex directly into cassandra columns? this would essentially mean that anyone could create a serializer to parse the flume event into columns.

taking it one step further, how about defining the conversion in configuration, like JSON or XML?

1 - read the "conversion definition" based on something in the flume headers (source id, app id, hostname, etc)
2 - retrieve the conversion definition from a data store (maybe cache it)
3 - execute the conversion creating a single batch mutation
4 - save batch mutation to cassandra

from flume-ng-cassandra-sink.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.