Comments (2)
Hi @salihkardan
I don't think this is the same issue I had to resolve flume side. It looks like you need to define your own regex interceptor and get the timestamp out of the log data.
eg: This is an example of my log data from one of the services
{"service":"example_service","event":"server_restart","timestamp":"1386090510581","uuid":"5kneh567-4bd8-49a1-8cd8-4cf142fb0bff","port":"8091","source_ip":"127.0.0.1","info":"example_service is alive on port 8989"}
then I extract the required data to add to event headers, ready for the flume-ng-cassandra-sink by:
orion.sources.spoolDir.type = spooldir
orion.sources.spoolDir.spoolDir = /mnt/spoolingDirLocation
orion.sources.spoolDir.inputCharset = UTF-8
orion.sources.spoolDir.deserializer.maxLineLength = 209715200
orion.sources.spoolDir.deletePolicy = never
orion.sources.spoolDir.interceptors = addSrc addHost addTimestamp addUUID
orion.sources.spoolDir.interceptors.addSrc.type = regex_extractor
orion.sources.spoolDir.interceptors.addSrc.regex = \"service\"\:\"([^"]*)
orion.sources.spoolDir.interceptors.addSrc.serializers = s1
orion.sources.spoolDir.interceptors.addSrc.serializers.s1.name = src
orion.sources.spoolDir.interceptors.addUUID.type = regex_extractor
orion.sources.spoolDir.interceptors.addUUID.regex = \"uuid\"\:\"([^"]*)
orion.sources.spoolDir.interceptors.addUUID.serializers = s1
orion.sources.spoolDir.interceptors.addUUID.serializers.s1.name = key
orion.sources.spoolDir.interceptors.addHost.type = org.apache.flume.interceptor.HostInterceptor$Builder
orion.sources.spoolDir.interceptors.addHost.preserveExisting = false
orion.sources.spoolDir.interceptors.addHost.useIP = true
orion.sources.spoolDir.interceptors.addHost.hostHeader = host
orion.sources.spoolDir.interceptors.addTimestamp.type = regex_extractor
orion.sources.spoolDir.interceptors.addTimestamp.regex = \"timestamp\"\:\"([^"]*)
orion.sources.spoolDir.interceptors.addTimestamp.serializers = s1
orion.sources.spoolDir.interceptors.addTimestamp.serializers.s1.name = timestamp
Expanding on #8 I reported, this was what I believe, a bug with the Spooling Directory Source in Flume 1.4
When my logs contained any special chars represented by 2 bytes in UTF-8 in my case the £ sign, check out the rest here http://www.utf8-chartable.de/ , when the data was being read in from spooling dir in chunks and one of those chunks happened to pull in the first byte of the special char, it was dropping the rest of the data in the file.
This is now fixed in Flume 1.5 which is still under development. I've cloned the dir and im compiling and packaging the code from that until 1.5 is released. This fixed the above issues and I havent seen any other bugs so happy to go to production with it.
Let me know if you require further info.
Viktor
from flume-ng-cassandra-sink.
I have started to see this error in production as well. Seems odd as timestamp event header is present but ends up as null at the point where its parsed to a long in FlumeLogEvent.java
public long getTimestamp() {
return Long.parseLong(getHeader(HEADER_TIMESTAMP));
}
@salihkardan - did you manage to get to the bottom of this?
Getting this error in production very sporadically - investigating further now but any info would be great. cc @btoddb
from flume-ng-cassandra-sink.
Related Issues (10)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flume-ng-cassandra-sink.