Comments (3)
if you already have the fields broken out using an interceptor, i think you
have what you need. does the interceptor put the fields back into the
payload of the LogEvent or use attributes on the log event?
the sink is fairly basic, so it does need some work to be used in
production. my thoughts on this have always been around using an
interceptor that takes the payload and converts the payload into standard
JSON (or some other) such that any sink would be able to interpret the JSON
to do as it needs. for cassandra sink it could parse the JSON and store
each JSON property as a column.
if you have an interceptor that does this, i'll take a pull request and
include in the code. i will probably be working on this again soon so your
question is timely :)
thx!
On Thu, Jan 17, 2013 at 9:36 AM, jeffb4 [email protected] wrote:
I'm writing Apache webserver logs to Cassandra using Flume and this Sink,
and I would like to break log entries into various fields/columns (I
already break in to fields with an interceptor).Would the best/canonical method of doing this be to extend
flume-ng-cassandra-sink with a serializer config directive, default said
directive to the existing serializer, and then (for my needs) create a
custom serializer that takes desired fields as a configuration option, and
stuffs them into Cassandra as columns?—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3.
from flume-ng-cassandra-sink.
The interceptor (default regex_extractor that comes with Flume) serializes the parsed-out data into event headers - I'm not familiar enough with Flume terminology to say whether that is LogEvent payload or attribute.
My thought (instead of the JSON conversion and then deconversion) was something like:
host1.sinks.sink1.type = com.btoddb.flume.sinks.cassandra.CassandraSink
# default Cassandra Serializer (no Flume header magic)
# host1.sinks.sink1.serializer = com.btoddb.flume.sinks.cassandra.SimpleCassandraEventSerializer
# custom Cassandra Serializer (insert Flume header fields as columns)
host1.sinks.sink1.serializer = com.blah.ComplexCassandraEventSerializer
# map Flume headers "foo", "bar", "alpha", and "beta" to Cassandra columns of the same name
host1.sinks.sink1.serializer.fieldcolumns = foo bar alpha beta
As far as your plugin goes, the big difference would be the addition of the .serializer config option (defaulting to your current use of the ByteBufferSerializer out of Hector).
If JSON/BSON was being written to more than MongoDB, or if Flume event headers weren't capable of storing columns, I could see a more generic JSON solution for in-flight data.
from flume-ng-cassandra-sink.
thanks for pinging me. i had some other things ahead of this, and wanted to understand a bit more (been a while since i've hit the code).
yes i think you're on to something, but instead of using the regex, maybe just supply a serializer that does the regex directly into cassandra columns? this would essentially mean that anyone could create a serializer to parse the flume event into columns.
taking it one step further, how about defining the conversion in configuration, like JSON or XML?
1 - read the "conversion definition" based on something in the flume headers (source id, app id, hostname, etc)
2 - retrieve the conversion definition from a data store (maybe cache it)
3 - execute the conversion creating a single batch mutation
4 - save batch mutation to cassandra
from flume-ng-cassandra-sink.
Related Issues (10)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flume-ng-cassandra-sink.