Coder Social home page Coder Social logo

awslabs / amazon-kinesis-agent Goto Github PK

View Code? Open in Web Editor NEW
350.0 47.0 216.0 591 KB

Continuously monitors a set of log files and sends new data to the Amazon Kinesis Stream and Amazon Kinesis Firehose in near-real-time.

License: Other

Shell 2.53% Java 97.47%

amazon-kinesis-agent's Introduction

Amazon Kinesis Agent

The Amazon Kinesis Agent is a stand-alone Java software application that offers an easier way to collect and ingest data into Amazon Kinesis services, including Amazon Kinesis Streams and Amazon Kinesis Firehose.

Features

  • Monitors file patterns and sends new data records to delivery streams
  • Handles file rotation, checkpointing, and retry upon failure
  • Delivers all data in a reliable, timely, and simpler manner
  • Emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process

Getting started

  1. Sign up for AWS — Before you begin, you need an AWS account. For more information about creating an AWS account and retrieving your AWS credentials, see AWS Account and Credentials in the AWS SDK for Java Developer Guide.
  2. Sign up for Amazon Kinesis — Go to the Amazon Kinesis console to sign up for the service and create a Amazon Kinesis stream or Firehose delivery stream. For more information, see Create an Amazon Kinesis Stream in the Amazon Kinesis Streams Developer Guide or Create an Amazon Kinesis Firehose Delivery Stream in the Amazon Kinesis Firehose Developer Guide.
  3. Minimum requirements — To start the Amazon Kinesis Agent, you need Java 1.7+.
  4. Using the Amazon Kinesis Agent — For more information about using the Amazon Kinesis Agent to deliver data to Streams and Firehose, see Writing to Amazon Kinesis with Agents and Writing to Delivery Streams with Agents.

Installing Amazon Kinesis Agent

After you've downloaded the code from GitHub, you can install the Amazon Kinesis Agent with the following command:

# Optionally, you can set DEBUG=1 in your environment to enable massively
# verbose output of the script
sudo ./setup --install

This setup script downloads all the dependencies and bootstraps the environment for running the Java program.

Configuring and starting Amazon Kinesis Agent

After the agent is installed, the configuration file can be found in /etc/aws-kinesis/agent.json. You need to modify this configuration file to set the data destinations and AWS credentials, and to point the agent to the files to push. After you complete the configuration, you can make the agent start automatically at system startup with the following command:

sudo chkconfig aws-kinesis-agent on

If you do not want the agent running at system startup, turn it off with the following command:

sudo chkconfig aws-kinesis-agent off

To start the agent manually, use the following command:

sudo service aws-kinesis-agent start

You can make sure the agent is running with the following command:

sudo service aws-kinesis-agent status

You may see messages such as aws-kinesis-agent (pid [PID]) is running...

To stop the agent, use the following command:

sudo service aws-kinesis-agent stop

Viewing the Amazon Kinesis Agent log file

The agent writes its logs to /var/log/aws-kinesis-agent/aws-kinesis-agent.log

Uninstalling Amazon Kinesis Agent

To uninstall the agent, use the following command:

sudo ./setup --uninstall

Building from Source

The installation done by the setup script is only tested on the following OS Disributions:

  • Red Hat Enterprise Linux version 7 or later
  • Amazon Linux AMI version 2015.09 or later
  • Ubuntu Linux version 12.04 or later
  • Debian Linux version 8.6 or later

For other distributions or platforms, you can build the Java project with the following command:

sudo ./setup --build

or by using Ant target as you would build any Java program:

ant [-Dbuild.dependencies=DEPENDENCY_DIR]

If you use Ant command, you need to download all the dependencies listed in pom.xml before building the Java program. DEPENDENCY_DIR is the directory where you download and store the dependencies. By default, the Amazon Kinesis Agent reads the configuration file from /etc/aws-kinesis/agent.json. You need to create such a file if it does not already exist. A sample configuration can be found at ./configuration/release/aws-kinesis-agent.json

To start the program, use the following command:

java -cp CLASSPATH "com.amazon.kinesis.streaming.agent.Agent"

CLASSPATH is the classpath to your dependencies and the target JAR file that you built from the step above.

Release Notes

Release 1.1 (April 11, 2016)

  • Pre-process data before sending it to destinations — Amazon Kinesis Agent now supports to pre-process the records parsed from monitored files before sending them to your streams. The processing capability can be enabled by adding dataProcessingOptions configuration to file flow. There are three available options for now: SINGLELINE, CSVTOJSON, and LOGTOJSON. For more information, see Writing to Amazon Kinesis with Agents and Writing to Delivery Streams with Agents.
  • Ingore tailing compressed files — Compressed file extensions, e.g. .gz, .bz2, and .zip, are ignored for tailing.
  • Force to kill the program on out-of-memory error — The program will be killed when it's out of memory.

Release 1.0 (October 7, 2015)

  • This is the first release.

amazon-kinesis-agent's People

Contributors

abashmak2 avatar buholzer avatar chaochenq avatar chris-gilmore avatar chupakabr avatar cyb3rd0g1 avatar dependabot[bot] avatar dgoradia avatar diranged avatar eridal avatar frankfarrell avatar gorkemmulayim avatar hyandell avatar jdonofrio728 avatar joshua-kim avatar maratxxx avatar mattford63 avatar mlindemu avatar nazgul33 avatar pcurry avatar qivers avatar roolerzz avatar rsolomo avatar rvaralda avatar unagi avatar xinzhelkinesis avatar yiyagong avatar zacharya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-kinesis-agent's Issues

Updated to 1.1 no longer works

Hi,

Updated our working agent from 1.0 to 1.1 and uploads to kinesis streams no longer seem to work.

We are using instance roles to allow access to kinesis resources.

Parsing is successful but no messages ever seem to be sent. What's more in the configuration file if we set a completely bogus stream name then no errors are reported everything happily continues without error (but obviously still not uploading anything).

SingleLineSplitter is expensive

Hello,

I am a bit concerned that the default behavior for this project (which itself is the default behavior for Firehose) can be extremely costly with regards to the 5k minimum record size. Naive usage means a record per newline which for us is around 300 bytes. It would be excellent to have a multi-line splitter which makes better packing of many lines into a single record. That way we aren't wasting 95% of the bandwidth reaching the record minimum.

-Matt

Optimized record size for concatenable events like JSON

Background

AWS Kinesis Firehose has a pricing policy that is great for large records. However, for small records of sizes of 1 KB, the customer ends up paying 5x the price of the same amount of data. This is because Kinesis rounds up to the nearest 5 kb.

Further, certain records can be concatenated together and still processed separately down stream. For example, many downstream big data products like Redshift, Hadoop and Spark support multiple JSON documents per line.

So, the following JSON events are equivalent in the above products:

{"event_id": 123, "doc_id": 9372}
{"event_id": 124, "doc_id": 7642}
{"event_id": 125, "doc_id": 1298}

and

{"event_id": 123, "doc_id": 9372}{"event_id": 124, "doc_id": 7642}{"event_id": 125, "doc_id": 1298}

Solution

It would be great if Kinesis Agent automatically combined events that can be combined into as large of records as possible.

Other Similar Items

No user-record aggregation

Hi,

Unlike the KPL there does not seem to be any userRecord aggregation. This means that we very quickly get throttled on events. Am I missing something? Without this the client is too expensive in terms of kinesis calls across the fleet at high demand.

Bevahior of truncatedRecordTerminator on aws-kinesis-agent

Hi:

While reading the documentation of the aws-kinesis-agent, I came across the truncatedRecordTerminator setting.

I wanted to confirm its behavior, by:

  1. setting truncatedRecordTerminator to \u0000, since each record is a valid XML file I want to make sure that by finding \u0000 (a known invalid XML character) it means the record was truncated and I have to piece it back together in the consumer.
  2. setting the maxBufferSizeBytes to 10 (since I know my records are larger than that, I wanted to force the truncation)

but I noticed in my kinesis consumer that all the records come across in their entirety (instead of 9 bytes at a time plus \u0000 at a time).

Is that a behavior on the consumer side that can be relied on?

Thanks
césar

Compressed files

Hi!

It's possible to use compressed files with Kinesis Agent ?

Missing error messages

I deployed through 'sudo yum install –y aws-kinesis-agent' on 2015/09 Amazon AMI.
Initially the aws-kinesis-agent-user did not have access to my file(s) configured, but no error message was thrown or information that the agent did not have access. I would expect an error message in the log that the path is not accessible for the agent.

After giving access to the file everything worked fine.

Then I configured a non-existing stream (Kinesis configuration).
Again no error message in the log file about the stream endpoint. Instead the message that 0 records had been send to the destination (which is correct, but not obvious due to the stream not existing).
I would expect an error message that the stream does not exist.

Could not initialize class org.joda.time.chrono.ISOChronology

We just discovered a series of hosts that all had the Kinesis agent completely broken, throwing the following errors about being unable to initialize the org.joda/time/chrono.ISOChronology class. We're not sure when these started, but restarting Kinesis fixed them.

Any thoughts on what could have caused this?

2016-12-14 20:42:02.847+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [WARN] Agent: Tailing is 2914.839844 MB
 (3056431517 bytes) behind. There are 8 file(s) newer than current file(s) being tailed.
2016-12-14 20:42:03.706+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (sender-815608) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] AsyncPublisher[kinesis:log-pi
peline-eng-syslog:/mnt/log/cron.json*]:RecordBuffer(id=14,records=17,bytes=7857) Retriable send error (java.lang.NoClassDefFoundError: Could not initialize class org.joda.time.ch
rono.ISOChronology). Will retry.
2016-12-14 20:42:04.077+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (sender-815609) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] AsyncPublisher[kinesis:log-pi
peline-eng-syslog:/mnt/log/authpriv.json*]:RecordBuffer(id=12,records=34,bytes=15561) Retriable send error (java.lang.NoClassDefFoundError: Could not initialize class org.joda.ti
me.chrono.ISOChronology). Will retry.
2016-12-14 20:42:04.986+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (sender-815610) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] AsyncPublisher[kinesis:log-pi
peline-eng-syslog:/mnt/log/mail.json*]:RecordBuffer(id=5,records=5,bytes=2538) Retriable send error (java.lang.NoClassDefFoundError: Could not initialize class org.joda.time.chro
no.ISOChronology). Will retry.
2016-12-14 20:42:06.704+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (sender-815611) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] AsyncPublisher[kinesis:log-pi
peline-eng-syslog:/mnt/log/user.json*]:RecordBuffer(id=4,records=1,bytes=458) Retriable send error (java.lang.NoClassDefFoundError: Could not initialize class org.joda.time.chron
o.ISOChronology). Will retry.
2016-12-14 20:42:07.266+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (cw-metrics-publisher) com.amazon.kinesis.streaming.agent.metrics.CWPublisherRunnable [ERROR] Caught exception 
thrown by metrics Publisher in CWPublisherRunnable
java.lang.NoClassDefFoundError: Could not initialize class org.joda.time.chrono.ISOChronology
	at org.joda.time.DateTimeUtils.getChronology(DateTimeUtils.java:266)
	at org.joda.time.format.DateTimeFormatter.selectChronology(DateTimeFormatter.java:968)
	at org.joda.time.format.DateTimeFormatter.printTo(DateTimeFormatter.java:672)
	at org.joda.time.format.DateTimeFormatter.printTo(DateTimeFormatter.java:560)
	at org.joda.time.format.DateTimeFormatter.print(DateTimeFormatter.java:644)
	at com.amazonaws.auth.internal.AWS4SignerUtils.formatDateStamp(AWS4SignerUtils.java:39)
	at com.amazonaws.auth.internal.AWS4SignerRequestParams.<init>(AWS4SignerRequestParams.java:85)
	at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:184)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:705)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:485)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:306)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:979)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.putMetricData(AmazonCloudWatchClient.java:370)
	at com.amazon.kinesis.streaming.agent.metrics.DefaultCWMetricsPublisher.publishMetrics(Unknown Source)
	at com.amazon.kinesis.streaming.agent.metrics.CWPublisherRunnable.runOnce(Unknown Source)
	at com.amazon.kinesis.streaming.agent.metrics.CWPublisherRunnable.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:745)
2016-12-14 20:42:07.587+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (sender-815612) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] AsyncPublisher[kinesis:log-pi
peline-eng-syslog:/mnt/log/syslog.json*]:RecordBuffer(id=8,records=1,bytes=1813) Retriable send error (java.lang.NoClassDefFoundError: Could not initialize class org.joda.time.ch
rono.ISOChronology). Will retry.
2016-12-14 20:42:09.187+0000 staging-us1-ecs-uswest2-31-i-4f3d5557 (sender-815613) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] AsyncPublisher[kinesis:log-pi
peline-eng-events:/mnt/flume/events/*]:RecordBuffer(id=2,records=475,bytes=498287) Retriable send error (java.lang.NoClassDefFoundError: Could not initialize class org.joda.time.
chrono.ISOChronology). Will retry.

How to rotate agent log-file?

It is very unclear from documentation and source how to operationally manage the agent logs. There are two classes of log files to consider:

  1. the AWS Kinesis Agent's own log (i.e. the "AKA log" supplied with the --log-file argument)
  2. the tailed logfiles specified in agent.json flows (i.e. the "tailed logs")

When logrotate rotates the tailed logs, the agent seems to recognize what happens and everything is fine. However, later when agent's own log is rotated, it continues to log to the existing log and doesn't recognize the new log. This is probably as expected, so I sent it a USR1 signal, and the agent responded by picking up the new agent log file correctly. However, it incorrectly freaked out about the tailed log file being "missing" and that it needed to restart from 0. This is not correct, it should not restart from zero but should understand that the log file was rotated.

What is the proper way to rotate both classes of logs? Can we please get this in documentation? Thank you for the hard work on this agent!

Unit Tests

I'm currently exploring the possibility of using this agent, but noticed proper code coverage is absent and can only assume the tests are in a separate project? If not, I'd be more than willing to help out with getting unit tests written!

Running Test Suites

Since this project doesn't follow the standard directory structure of a maven project, I was wondering how I would run the test suite? Would it make sense to move all tests to src/test/java? (I'd be more than willing to make the project layout update).

Log rotation makes agent go crazy

I setup the agent to feed from a log file that gets rotated by a service (a producer) automatically. When the rotation occurs, the kinesis agent freaks out and spews a stack trace every 100ms! I totally understand why the agent would get confused, and I have disabled the automatic rotation assuming the agent does this for me automatically. However, spewing a stack trace to disk every 100ms is probably not a good idea as it might fill up a disk or introduce unnecessary IO load.

Might it make sense to have the process die when it encounters this condition? Most production environments have their process monitoring, and its a best practice to have a process die with a good error message when it is in a non-recoverable state. I'd much prefer that to thrashing.

Here is what it spews, every 100ms or so:

java.lang.IllegalStateException
        at com.google.common.base.Preconditions.checkState(Preconditions.java:158)
        at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.refresh(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.updateRecordParser(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source)
        at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source)
        at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:60)
        at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
        at java.lang.Thread.run(Thread.java:745)

unable to transfer data using kinesis agent.

Steps followed :

Hi, I am using kinesis agent and facing issues to transfer data from EC2 machine( Rhel - Kinesis agent installed and configured agent.json). But still could not successfully transfer to kinesis stream and via firehose delivery to s3.

Below are the configuration steps followed :

Steps :
1)
Launch EC2 - t2.small instance - RHEL7.3
Create an elastic IP in EDL-V0 VPC
Associate elastic IP

Enable epel repo https://fedoraproject.org/wiki/EPEL#How_can_I_use_these_extra_packages.3F
install pip https://packaging.python.org/install_requirements_linux/

install tweepy https://github.com/tweepy/tweepy

Create twitter account : EDLV0

install vim
touch twitter_streaming.py > twdata.txt - This will continuously add new lines in the file.

install kenises agent in EC2 : http://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html
sudo yum install –y https://s3.amazonaws.com/streaming-data-agent/aws-kinesis-agent-latest.amzn1.noarch.rpm

Set up credentials for kinesis agent.
vim /etc/sysconfig/aws-kinesis-agent

Enable kinesis Streams.
Go to kenisis -> create new stream -> "EDLV0" -> #shards = 2 -> create.

Configure which file to pick up from the location :
vim /etc/aws-kinesis/agent.json
"
"filePattern": "/home/ec2-user/twitterData/twdata.txt",
"kinesisStream": "EDLV0",
"
sudo service aws-kinesis-agent status
sudo service aws-kinesis-agent restart
tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log

8)create kinesis firehose delivery stream. - "EDLV0Stream"
Enable Kinesis firehose to push data to s3
create a s3 bucket edlv0
delivery stream name : EDLV0Stream

Configuration :
Buffer size : defaults
Buffer interval : defaults
create IAM role : firehose_delivery_role_EDLV0

Configure which file to pick up from the location :
vim /etc/aws-kinesis/agent.json
{
"filePattern": "/home/ec2-user/twitterData/twdata*",
"deliveryStream": "EDLV0Stream"
}
sudo service aws-kinesis-agent restart

give kinesis agent user permissions :
setfacl -m u:aws-kinesis-agent-user:rwx twitterData

[ec2-user@ip-10-0-0-138 twitterData]$ ls -l
total 991212
-rwxrwxrwx. 1 aws-kinesis-agent-user aws-kinesis-agent-user 4 Nov 15 01:21 twdata.txt

Log :
2016-11-15 22:52:01.223-0500 ip-10-0-0-138.ec2.internal (FileTailer[kinesis:EDLV0:/home/ec2-user/twitterData/.txt].MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.tailing.FileTailer [INFO] FileTailer[kinesis:EDLV0:/home/ec2-user/twitterData/.txt]: Tailer Progress: Tailer has parsed 0 records (0 bytes), transformed 0 records, skipped 0 records, and has successfully sent 0 records to destination.
2016-11-15 22:52:01.226-0500 ip-10-0-0-138.ec2.internal (FileTailer[fh:EDLV0Stream:/home/ec2-user/twitterData/.txt].MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.tailing.FileTailer [INFO] FileTailer[fh:EDLV0Stream:/home/ec2-user/twitterData/.txt]: Tailer Progress: Tailer has parsed 0 records (0 bytes), transformed 0 records, skipped 0 records, and has successfully sent 0 records to destination.
2016-11-15 22:52:01.228-0500 ip-10-0-0-138.ec2.internal (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 0 records parsed (0 bytes), and 0 records sent successfully to destinations. Uptime: 147240066ms

Sorry for the long post. Desperate to solve the issue quickly.

Recommended way to deploy the agent.

I am really happy with the agent, and I have been going down the path of writing multiple files from my producer to disk, and allow the agent tail them all as they show up in the directory.

After some discussions with my peers though, it seems that maybe that is the incorrect way to use the agent, and instead I should be appending new records to a single file, and then the agent will pick up the new records as they are written?

Is there a best practices for the agent written up somewhere, google could not find anything.

Can not install the agent on EC2

Downloaded the agent from github and ran .setup --install

Error:

Unable to locate tools.jar. Expected to find it in /usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar
Buildfile: build.xml does not exist!
Build failed
Failed to build the Java project

There was nothing in the /usr/lib/jvm directory before running the installation, which means whatever installs are done, are by the agent setup.

System OS: Ubuntu 14.04 LTS

Allow users to specify partition keys

Currently one can only select between random partition keys, or partition keys derived from the data. It should also be possible to specify a partition key associated with each file to be published.

Exceeded max record size: 40960

I have records that usually are around 20 KB, but can sometimes be larger. I'm occasionally running into what seems like a hardcoded limit of 40 KB (40960 B):

2016-06-16 13:25:23.474+0000 ip-172-31-22-141 (FileTailer[fh:myfirehose:/data/transactions.log])
 com.amazon.kinesis.streaming.agent.tailing.FileTailer [WARN] FileTailer[fh:myfirehose:/data/transactions.log]:
 Truncated a record in TrackedFile(id=(dev=ca10,ino=12), path=/data/transactions.log,
 lastModifiedTime=1466083520220, size=22411), because it exceeded the the configured max record size: 40960

There are a number of classes that have this limit set:

Any suggestions on how this can be overridden, or do I have to build my own binary?

Clean up files

Is there a way to have the agent clean up the files after they've been processed and sent to my delivery stream or is that on me to setup with another process?

Pre-packaging a binary release of the client?

We are looking into setting up the Kinesis Agent on all of our servers -- and having them all do a bunch of wget ... calls to install the agent properly is not going to work for us. After I spent some time re-rafactoring a little bit of the setup script, I came up with this simple packaging method. What do you think about me opening a PR for this?

The basic idea here is that TravisCI runs on any tag you push to the repo. It handles downloading all of the source dependencies (the various .jar files), and then it bundles the whole directory up in a binary.tgz file. Then, with a simple oauth token into Github, Travis auto-uploads the release to Github:

image

From that, anyone can easily download the file and execute the install much more quickly:

root@vagrant-ubuntu-trusty-64:/home/vagrant# mkdir test
root@vagrant-ubuntu-trusty-64:/home/vagrant# cd test
root@vagrant-ubuntu-trusty-64:/home/vagrant/test# curl -L https://github.com/Nextdoor/amazon-kinesis-agent/releases/download/test_release/binary.tgz | tar -zx  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   587    0   587    0     0    204      0 --:--:--  0:00:02 --:--:--   204
100 14.5M  100 14.5M    0     0   650k      0  0:00:22  0:00:22 --:--:--  980k
root@vagrant-ubuntu-trusty-64:/home/vagrant/test# ls
ant_build  bin  build.xml  configuration  dependencies  LICENSE.txt  NOTICE.txt  pom.xml  README.md  setup  src  support  tst
root@vagrant-ubuntu-trusty-64:/home/vagrant/test# ./setup --install
Detected OS Distro: Ubuntu
Uninstalling aws-kinesis-agent ...
userdel: user 'aws-kinesis-agent-user' does not exist
Installing Kinesis Agent ...
Downloading dependencies ...
Buildfile: /home/vagrant/test/build.xml

clean:
   [delete] Deleting directory /home/vagrant/test/ant_build

get-java-version:

check-java-version:

init:
    [mkdir] Created dir: /home/vagrant/test/ant_build

compile:
    [mkdir] Created dir: /home/vagrant/test/ant_build/private
    [javac] Compiling 102 source files to /home/vagrant/test/ant_build/private
    [javac] warning: Supported source version 'RELEASE_6' from annotation processor 'org.sonatype.guice.bean.scanners.index.SisuIndexAPT6' less than -source '1.7'
    [javac] Note: /home/vagrant/test/src/com/amazon/kinesis/streaming/agent/tailing/FileFlow.java uses unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 1 warning
    [javac] Creating empty /home/vagrant/test/ant_build/private/com/amazon/kinesis/streaming/agent/tailing/checkpoints/package-info.class
    [javac] Creating empty /home/vagrant/test/ant_build/private/com/amazon/kinesis/streaming/agent/package-info.class
    [javac] Creating empty /home/vagrant/test/ant_build/private/com/amazon/kinesis/streaming/agent/metrics/package-info.class
    [javac] Creating empty /home/vagrant/test/ant_build/private/com/amazon/kinesis/streaming/agent/tailing/package-info.class
    [javac] Creating empty /home/vagrant/test/ant_build/private/com/amazon/kinesis/streaming/agent/config/package-info.class
     [copy] Copying 2 files to /home/vagrant/test/ant_build/private/com/amazon/kinesis/streaming/agent

build:
    [mkdir] Created dir: /home/vagrant/test/ant_build/lib
      [jar] Building jar: /home/vagrant/test/ant_build/lib/AWSKinesisStreamingDataAgent-1.1.jar

release:

BUILD SUCCESSFUL
Total time: 22 seconds
Configuration file installed at: /etc/aws-kinesis/agent.json
Configuration details:
{
  "cloudwatch.emitMetrics": true,
  "kinesis.endpoint": "",
  "firehose.endpoint": "",

  "flows": [
    {
      "filePattern": "/tmp/app.log*",
      "kinesisStream": "yourkinesisstream",
      "partitionKeyOption": "RANDOM"
    },
    {
      "filePattern": "/tmp/app.log*",
      "deliveryStream": "yourdeliverystream"
    }
  ]
}
 Adding system startup for /etc/init.d/aws-kinesis-agent ...
   /etc/rc0.d/K20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
   /etc/rc1.d/K20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
   /etc/rc6.d/K20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
   /etc/rc2.d/S20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
   /etc/rc3.d/S20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
   /etc/rc4.d/S20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
   /etc/rc5.d/S20aws-kinesis-agent -> ../init.d/aws-kinesis-agent
Amazon Kinesis Agent is installed successfully.
To start the aws-kinesis-agent service, run:
  sudo service aws-kinesis-agent start
To stop the aws-kinesis-agent service, run:
  sudo service aws-kinesis-agent stop
To check the status of the aws-kinesis-agent service, run:
  sudo service aws-kinesis-agent status

aws-kinesis-agent log file will be found at: /var/log/aws-kinesis-agent
To make the agent automatically start at system startup, type:
  sudo chkconfig aws-kinesis-agent on

Your installation has completed!
root@vagrant-ubuntu-trusty-64:/home/vagrant/test# 

You can see the diff here Nextdoor#1

How to send logs via proxy?

I've tried injecting following Java args into launch script start-aws-kinesis-agent:

-Djava.net.useSystemProxies=true
-Dhttp.nonProxyHosts=localhost|127.0.0.1|10.*.*.*|*.mydomain.com
-Dhttp.proxyHost=http://proxy.mydomain.com
-Dhttp.proxyPort=8080
-Dhtttps.proxyHost=https://proxy.mydomain.com
-Dhttps.proxyPort=8443

and some well known environment variables:

HTTP_PROXY=http://proxy.mydomain.com:8080
HTTPS_PROXY=http://proxy.mydomain.com:8443
http_proxy=http://proxy.mydomain.com:8080
https_proxy=http://proxy.mydomain.com:8443

but no luck until I realized that it's not working because the agent uses Apache http client library.

Is it possible to send logs via proxy by setting specific environment variables or command line options?

ProvisionedThroughputExceededExceptions are spammy!

See:

} else {
logger.error("{}: Record returned error code {}: {}", flow.getId(), responseEntry.getErrorCode(),
responseEntry.getErrorMessage());
logger.trace("{}:{} Record {} returned error code {}: {}", flow.getId(), buffer, index, responseEntry.getErrorCode(),
responseEntry.getErrorMessage());
errors.add(responseEntry.getErrorCode());
}
++index;

2016-07-13 21:32:40.435+0000 staging-us1-tools-xxx (sender-1100) com.amazon.kinesis.streaming.agent.tailing.KinesisSender [ERROR] kinesis:log-pipeline-eng-syslog:/mnt/log/*: Record returned error code ProvisionedThroughputExceededException: Rate exceeded for shard shardId-000000000001 in stream log-pipeline-eng-syslog under account xxx.

So ... when we're gleefully sending off logs to Kinesis and we suddenly bump into a temporary throttle, we are given tens of thousands of these log messages in just a matter of seconds:

[root@staging-us1-tools-xxx:/mnt/log:130]# grep ProvisionedThroughputExceededException /var/log/aws-kinesis-agent/aws-kinesis-agent.log  | wc
  13317  306291 5300166

I really feel like this code could be improved to write out a message like:

kinesis:log-pipeline-eng-syslog:/mnt/log/*: Record returned error code: ProvisionedThroughputExceededException (4924 times) ....

Otherwise this message is really spammy and has minimal usefulness. I'd like to see it reported once-per-patch, not once-per-message.

jackson-dataformat-cbor : class not found exception

when not using any parser for format converter, the following exception happens but that exception is ignored, printing "Retriable send error".
to further diagnose the problem, I had to add stack trace in OnSendError().

com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [ERROR] stack trace : 
com.amazonaws.AmazonClientException: Unable to marshall request to JSON: com/fasterxml/jackson/dataformat/cbor/CBORFactory
    at com.amazonaws.services.kinesis.model.transform.PutRecordsRequestMarshaller.marshall(PutRecordsRequestMarshaller.java:97)
    at com.amazonaws.services.kinesis.AmazonKinesisClient.putRecords(AmazonKinesisClient.java:1733)
    at com.amazon.kinesis.streaming.agent.tailing.KinesisSender.attemptSend(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.AbstractSender.sendBuffer(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.SimplePublisher.sendBufferSync(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher.access$001(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher$1.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: com/fasterxml/jackson/dataformat/cbor/CBORFactory
    at com.amazonaws.protocol.json.SdkJsonProtocolFactory.getSdkFactory(SdkJsonProtocolFactory.java:99)
    at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createGenerator(SdkJsonProtocolFactory.java:54)
    at com.amazonaws.services.kinesis.model.transform.PutRecordsRequestMarshaller.marshall(PutRecordsRequestMarshaller.java:65)
    ... 9 more
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.dataformat.cbor.CBORFactory
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 12 more

As we can see in the previous stack trace, the problem was ClassNotFoundException, missing the following artifact

  • com/fasterxml/jackson/dataformat/jackson-dataformat-cbor

setup script should download that artifact to resolve the problem.

Exception on LogRotate

Hello,

Whenever logrotate rotates away a log file being written by a webserver, I get the follow exception thrown:

(FileTailer[fh:SlimAccess:/var/log/access_log]) com.amazon.kinesis.streaming.agent.tailing.FileTailer [ERROR] FileTailer[fh:SlimAccess:/var/log/access_log]: Error when processing current input file or when tracking its status.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.findCurrentOpenFileAfterTruncate(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.refresh(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.updateRecordParser(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(Unknown Source)
at com.google.common.util.concurrent.Callables$3.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)

Logrotate isn't occurring more than once every second either.

/var/log/access_log {
copytruncate
compress
rotate 365
size=+1M
olddir rotated
notifempty
missingok
}

The log folder and file has r+x permissions too. How are we supposed to rotate logs away from Firehose? even Linux truncate manages to blow up the Firehose Agent. I don't actually want to keep the raw-log as Firehose has basically done that for me by putting it in it's stream, I just don't want my EC2 to fill up from huge logs.

SYSCONFIG file for agent should have AGENT_USER

In order to avoid permissions issues on log files/directories that not open, the agent should support running as root via the sysconfig mechanism if need be. AGENT_USER should be in the syconfig.

Similar to other push agents (like splunk), which run as root to avoid potential permissions mess, this should allow running as root and not a specific user that requires permissions everywhere.

On Prem Version

I'm looking to put the agent on a server locally in my private datacenter and stream data up to firehose. Is this supported? Also, is there a way to install without building from source?

I'm trying now and have the following error:

Buildfile: /amazon-kinesis-agent/build.xml

clean:

get-java-version:

check-java-version:

init:
    [mkdir] Created dir: /amazon-kinesis-agent/ant_build

compile:
    [mkdir] Created dir: /amazon-kinesis-agent/ant_build/private
    [javac] Compiling 85 source files to /amazon-kinesis-agent/ant_build/private
    [javac] warning: Supported source version 'RELEASE_6' from annotation processor 'org.sonatype.guice.bean.scanners.index.SisuIndexAPT6' less than -source '1.7'
    [javac] /amazon-kinesis-agent/src/com/amazon/kinesis/streaming/agent/Agent.java:400: error: cannot find symbol
    [javac]         return MoreExecutors.directExecutor();
    [javac]                             ^
    [javac]   symbol:   method directExecutor()
    [javac]   location: class MoreExecutors
    [javac] 1 error
    [javac] 1 warning

BUILD FAILED
/amazon-kinesis-agent/build.xml:38: Compile failed; see the compiler error output for details.

Total time: 9 seconds

Compression support

Hi,

I played around with the undocumented gzip settings but it didn't seem to have any affect on the network through-put. Will/is compression a feature?

Customize the log4j logger...

Since you're leveraging log4j, we should be able to set our own log4j config file and customize our logging as we wish. Is that possible now, or are there code changes required to support that?

what happens when kinesis is throttled?

Hi again,

:-)

What's the behaviour of the agent when say for example Kinesis is throttled? Do logs get dropped or does the tailing stay at the last sent log or something else?

I've noticed memory consumption increase as throttling occurs.

In our testing we configured log generation to be much greater than what a kinesis stream could handle. When throttling turned off (i.e, kinesis allowed traffic again) we saw very high short bursts of CPU utilisation until throttling started again. What could be done to limit this?

Init status returns the wrong exit code

The init script does not return the correct exit code when checking the status of the aws-kinesis-agent service:

# service aws-kinesis-agent status
aws-kinesis-agent is stopped
# echo $?
0

We noticed that our puppet script was failing to start the agent because it thought the agent was already running.

UnknownOperationException when pushing records to Firehose Stream

Unable to push records to firehose, following is the trace

2016-12-26 18:59:40.681+0000 ip-172-31-130-29 (sender-5) com.amazonaws.requestId [DEBUG] x-amzn-RequestId: ecabfcaa-0b7c-3d0b-bc70-96767bc652d8
2016-12-26 18:59:40.682+0000 ip-172-31-130-29 (sender-5) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [DEBUG] AsyncPublisher[fh:qa-exception-logs:/var/log/tomcat8/app/exceptions.json]:RecordBuffer(id=2,records=34,bytes=16626) Retriable send error (com.amazonaws.AmazonServiceException: null (Service: AmazonKinesisFirehose; Status Code: 400; Error Code: UnknownOperationException; Request ID: ecabfcaa-0b7c-3d0b-bc70-96767bc652d8)). Will retry.
2016-12-26 18:59:40.682+0000 ip-172-31-130-29 (sender-5) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [TRACE] AsyncPublisher[fh:qa-exception-logs:/var/log/tomcat8/app/exceptions.json]:RecordBuffer(id=2,records=34,bytes=16626) Buffer Queued for Retry
2016-12-26 18:59:40.682+0000 ip-172-31-130-29 (sender-5) com.amazon.kinesis.streaming.agent.tailing.AsyncPublisher [TRACE] AsyncPublisher[fh:qa-exception-logs:/var/log/tomcat8/app/exceptions.json]:RecordBuffer(id=2,records=34,bytes=16626) Send Completed

Any idea, what could be wrong?

missing hostname/instanceID for Kinesis Agent forwarded logs

The CloudWatch Logs Agent provides predefined variables to retain info about the host sending the log data.

http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html

log_stream_name
Specifies the destination log stream. You can use a literal string or predefined variables ({instance_id}, {hostname}, {ip_address}), or combination of both to define a log stream name. A log stream is created automatically if it doesn't already exist.

I see no way to retain sending host information for data forwarded to firehoses via the Kinesis Agent. Our use case includes sending the firehose to Elasticsearch and we have no way of knowing which host generated the monitored log. We want to know where the logs came from. Is there a mechanism similar to what the CWL Agent provides that I am overlooking?

Thanks,
Trevor

Syslog to JSON

Hi,

I'm completely unable to get the following config to send messages as JSON.

{
  "cloudwatch.emitMetrics": true,
  "kinesis.endpoint": "",
  "firehose.endpoint": "",
  "flows": [
  {
      "filePattern": "/var/log/syslog",
      "kinesisStream": "dev-amazon-agent-kinesis-AmazonAgentKinesisStream-2E811",
      "dataProcessingOptions": {
             "optionName": "LOGTOJSON",
             "logFormat": "SYSLOG" }
    }
                      ]
}

Messages just come through as plain text. Debug level doesn't show any errors. What's the best way to further debug this?

LOGTOJSON not parsing with custom pattern

Hi!

I'm trying to parse a custom nginx access log line with this agent. You can find the relevant part of my configuration below:

 {
        "filePattern": "/my/log/path/nginx.access.log",
        "deliveryStream": "my-delivery-stream",
        "dataProcessingOptions": [
         {
                "optionName": "LOGTOJSON",
                "logFormat": "COMMONAPACHELOG",
                "matchPattern": "^([\\d.]+) - - \\[([\\w:/]+).*\\] \"(\\w+) /(\\w+).*(\\d.+)\" (\\d{3}) (\\d+) \"(\\S+)\" \"(.*?)\" ([\\d.]+) \"(.*?)\" (\\d+);(\\d+)",
                "customFieldNames": ["client_ip", "time_stamp", "verb", "uri_path", "http_ver", "http_status", "bytes", "referrer", "agent", "response_time", "graphql", "company_id", "user_id"]
         }
       ]
     }

Here's a log line as example (the above regex works fine, it matches the following log line as it should):

111.111.111.111 - - [02/Dec/2016:13:58:47 +0000] "POST /graphql HTTP/1.1" 200 1161 "https://www.myurl.com/endpoint/12345" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" 0.172 "query user userMessages hasPermissions F0 F1" 11111;222222

What happens is that the logs are coming in plain text to S3, exactly like the above line and not parsed as json. I really don't see anything wrong with the configuration or the regex, am I missing something obscure here?

Thanks!

Agent Tailer cannot find any records

I met a problem that after I set up the agent, the agent cannot find any records. I have no idea what happened, what's more could I know how to see the permissions of kinesis-agent-user?

Thanks!

Agent 'flows' currently only support following a single log file at any given time...

The problem

We found this out the fun way ... Lets imagine you have a fairly normal syslog-ng config like this:

destination d_mnt_log { file(
      "/mnt/log/$FACILITY.log"
      perm(0644) owner(root) group(root) dir_perm(0755) create_dirs(yes));
     };
log { source(s_src);  destination(d_mnt_log); };

This creates a log directory like this:

[root@us1-scheduler-...:~:2]# ls -la /mnt/log
total 281788
drwxr-xr-x 2 root root       121 Jul 15 20:47 .
drwxr-xr-x 8 root root        74 Jul 15 20:38 ..
-rw-r--r-- 1 root root    284829 Jul 18 15:17 authpriv.log
-rw-r--r-- 1 root root    158226 Jul 18 15:17 cron.log
-rw-r--r-- 1 root root   1549178 Jul 18 15:22 daemon.log
-rw-r--r-- 1 root root       176 Jul 15 20:38 kern.log
-rw-r--r-- 1 root root 198713506 Jul 18 15:23 local0.log
-rw-r--r-- 1 root root  14755528 Jul 18 15:23 syslog.log
-rw-r--r-- 1 root root    327945 Jul 18 15:21 user.log

Now imagine you want to tail all of these files and send all the data into Kinesis. This makes sense, right?

{
... 
  "flows": [
    {
      "filePattern": "/mnt/log/*.log*",
      "kinesisStream": "log-pipeline-<%= @environment %>-syslog",
      "partitionKeyOption": "RANDOM"
    }
  ]
}

Wrong.

This is a dangerous configuration because the Agent will watch each of the files that match the regex (all of them, in this case). After each file is modified, the agent will become confused and jump to that file to start taking in log events. It will never follow all of the files at once, instead it follows only one file at a time.

The solution

I don't have a real solution here.. but ideally what we want is to be able to tell the agent that there is a whole directory of files ... follow them all, and keep track of all of their inodes. If any file is rotated, thats fine .. jsut keep following it by inode. If a new file is created, then start reading that file as well.

I know that this can lead to resource problems if you are writing out a lot of log files, but I believe that complexity and concern is up to the end-user to decide upon implementation.

Race condition inside aws-kinesis-agent-babysit

Occasionally I will get a FAILURE when restarting aws-kinesis-agent (for example, inside an ansible playbook run). I've tracked it down to the babysit cron job attempting to also restart while another restart is in progress. During the stopping phase of my restart, there is a short period of time in which the PIDFILE still exists but the agent has been killed. If this happens at the top of a minute (during which the babysit cron job runs), then the babysit cron job will think the agent has unexpectedly died. See here for the relevant code inside the babysit cron job:

# Check if PID file exists. 
# If it does not, it means either the agent was never started or it was stopped by the user.
[[ -f $PIDFILE ]] || exit 0

# Check if the child Java process is alive. If not, we should start
[[ -n $(get_agent_pid) ]] || start_agent

I propose adding a sleep in between the above two statements, for example:

[[ -f $PIDFILE ]] || exit 0
sleep 12
[[ -n $(get_agent_pid) ]] || start_agent

Sleeping for 1 or 2 seconds is probably enough, but I'd feel safer with a little more insurance of 12 seconds.
See the following note inside the SysVinit script for the agent:

SHUTDOWN_TIME=11   #10 second default value in AgentConfiguration.java, +1 second buffer 

Kinesis Agent has high CPU for cases with many files

Our use case is to SFTP a file to our server every minute, and Kinesis agent is configured to match these files. The files are not modified after they are written and they have a footer line, eg "File Administratively Closed..."

Using jstack we identified the thread using high CPU as

"FileTailer[<filePattern>]" #13 prio=5 os_prio=0 tid=0x00007f2bf0574000 nid=0xdadb runnable [0x00007f2bcc490000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.Object.hashCode(Native Method)
    at java.util.HashMap.hash(HashMap.java:338)
    at java.util.HashMap.put(HashMap.java:611)
    at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.syncCounterpartsByFileId(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.TrackedFileRotationAnalyzer.<init>(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.updateCurrentFile(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.SourceFileTracker.refresh(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.FileTailer.updateRecordParser(Unknown Source)
    - locked <0x00000000eac78af0> (a com.amazon.kinesis.streaming.agent.tailing.FileTailer)
    at com.amazon.kinesis.streaming.agent.tailing.FileTailer.processRecords(Unknown Source)
    - locked <0x00000000eac78af0> (a com.amazon.kinesis.streaming.agent.tailing.FileTailer)
    at com.amazon.kinesis.streaming.agent.tailing.FileTailer.runOnce(Unknown Source)
    at com.amazon.kinesis.streaming.agent.tailing.FileTailer.run(Unknown Source)
    at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:60)
    at com.google.common.util.concurrent.Callables$3.run(Callables.java:95)
    at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
    - None 

Possible Solution:
A solution would be to have a fileFooterPattern that stops tailing the file when it is matched. This has the advantage of not modifying the file (eg, as would adding .CLOSED to the file).

Syslog to JSON

Hi,

We had the latest version of the agent working - converting syslog records to json and uploading to kinesis. And then it just stopped. Redeployed and the same thing. At a complete loss why. The error and trace logging shows nothing. New logs are reported as parsed but nothing ever gets uploaded. The upload count just sits at zero.

I setup another configuration entry with CSVTOJSON configured and some dummy data. This works fine, logs are parsed, converted to JSON and uploaded.

Sorry this bug report is so poor :-(

Any pointers on what your steps to debug would be much appreciated. I see there are some tests - how are these run? Are you testing direct against Kinesis (we are) or do you have a mock kinesis setup you could share?

Is there any possibility that record missing or duplicated?

Hi there,

We are using Kinesis agent for our log monitoring and it's really helping us!

However, we encountered some situations where:

  1. our logs are missing (or partially missing in single line log)
  2. our logs are sent to Kinesis stream more than once

Is there any possibility that record will be missing or duplicated in the current Kinesis agent version?
I saw these comment lines in source code:

/**
* TODO: If a Truncate/Rename rotation happen while this method is running,
* behavior is undefined. Should we lock the current file? Would
* this prevent truncation and renaming?
* @param newSnapshot
* @throws IOException
*/
@VisibleForTesting
boolean updateCurrentFile(TrackedFileList newSnapshot) throws IOException {

PS:

  1. Our Kinesis agent config file:

{
"checkpointFile": "/tmp/aws-kinesis-agent-checkpoints",
"cloudwatch.emitMetrics": false,
"kinesis.endpoint": "https://kinesis.ap-southeast-1.amazonaws.com",
"awsAccessKeyId": "xxx",
"awsSecretAccessKey": "yyy",
"flows": [
{
"filePattern": "/mnt/logs/search.*log",
"multiLineStartPattern": "ProjectName:ComponentName",
"kinesisStream": "stream-name",
"partitionKeyOption": "RANDOM",
"maxBufferAgeMillis": "1000"
}
]
}

  1. We are using logrotate tool for our log rotation.

0 records sent successfully to destinations.

Please add errors to the agent log.

When the agent has invalid credentials, or the destination stream is named incorrectly in configuration, there is no error produced by the agent. Instead, the agent produces a message that:

(Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 2 records parsed (104 bytes), and 0 records sent successfully to destinations. Uptime: 210032ms

After a day or so of debugging, I figured out how to get it to successfully send to a stream. It's a little alarming that there isn't any indication of error in the log file except that 0 records were sent. I can't imagine any reason the agent couldn't/shouldn't log errors when encountered.

possibility of data not being sent - expected behavior?

I have a server where we are using the agent to send log files to a kinesis streams. Last night, I had to reboot the server. What I didn't realize was that the agent wasn't set to start on boot until about 20 minutes later.

After I started up the agent, my expectation was that agent would start from the last file offset it had recorded in the SQL Lite database. However, it seems that agent started at the end of the file, which resulted in a 20 minute gap of the data.

I see that there's initialPosition, but I wouldn't want to set it to START_OF_FILE because I wouldn't want to re-process a large number of event needlessly.

Is there a way for the agent to continue where it left off after restart or do you have to roll your own solution?

Kinesis Agent not starting : AccessDeniedException: /var/run/aws-kinesis-agent

I have installed agent on Ubuntu 14.04... It was working fine before ; but then it stopped suddenly and now getting following error while starting !

sudo service aws-kinesis-agent start

  • Starting aws-kinesis-agent OK

2016-07-10 18:46:40.373+0530 prod-api-ip-10-0-0-48 (main) com.amazon.kinesis.streaming.agent.Agent INFO Reading configuration from file: /etc/aws-kinesis/agent.json
2016-07-10 18:46:40.404+0530 prod-api-ip-10-0-0-48 (main) com.amazon.kinesis.streaming.agent.Agent INFO null: Agent will use up to 96 threads for sending data.
2016-07-10 18:46:40.411+0530 prod-api-ip-10-0-0-48 (main) com.amazon.kinesis.streaming.agent.Agent ERROR Unhandled error.
java.lang.RuntimeException: Failed to create or connect to the checkpoint database.
at com.amazon.kinesis.streaming.agent.tailing.checkpoints.SQLiteFileCheckpointStore.connect(Unknown Source)
at com.amazon.kinesis.streaming.agent.tailing.checkpoints.SQLiteFileCheckpointStore.(Unknown Source)
at com.amazon.kinesis.streaming.agent.Agent.(Unknown Source)
at com.amazon.kinesis.streaming.agent.Agent.main(Unknown Source)
Caused by: java.nio.file.AccessDeniedException: /var/run/aws-kinesis-agent
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:767)

Install failed

Linux 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

java -version
java version "1.7.0_95"
OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

compile:
    [mkdir] Created dir: /home/meng/amazon-kinesis-agent/ant_build/private
    [javac] Compiling 85 source files to /home/meng/amazon-kinesis-agent/ant_build/private
    [javac] /home/meng/amazon-kinesis-agent/src/com/amazon/kinesis/streaming/agent/Agent.java:400: 错误: 找不到符号
    [javac]         return MoreExecutors.directExecutor();
    [javac]                             ^
    [javac]   符号:   方法 directExecutor()
    [javac]   位置: 类 MoreExecutors
    [javac] /home/meng/amazon-kinesis-agent/src/com/amazon/kinesis/streaming/agent/config/Configuration.java:125: 错误: 找不到符号
    [javac]                 Doubles.stringConverter());
    [javac]                        ^
    [javac]   符号:   方法 stringConverter()
    [javac]   位置: 类 Doubles
    [javac] /home/meng/amazon-kinesis-agent/src/com/amazon/kinesis/streaming/agent/config/Configuration.java:128: 错误: 找不到符号
    [javac]                 Ints.stringConverter());
    [javac]                     ^
    [javac]   符号:   方法 stringConverter()
    [javac]   位置: 类 Ints
    [javac] /home/meng/amazon-kinesis-agent/src/com/amazon/kinesis/streaming/agent/config/Configuration.java:131: 错误: 找不到符号
    [javac]                 Longs.stringConverter());
    [javac]                      ^
    [javac]   符号:   方法 stringConverter()
    [javac]   位置: 类 Longs
    [javac] 4 个错误

BUILD FAILED
/home/meng/amazon-kinesis-agent/build.xml:38: Compile failed; see the compiler error output for details.

Cannot compile the latest build

when i try mvn clean install or ./setup --install

init:
[mkdir] Created dir: /home/parallels/software/amazon-kinesis-agent-master/ant_build

compile:
[mkdir] Created dir: /home/parallels/software/amazon-kinesis-agent-master/ant_build/private
[javac] Compiling 102 source files to /home/parallels/software/amazon-kinesis-agent-master/ant_build/private
[javac] warning: [options] bootstrap class path not set in conjunction with -source 1.7
[javac] warning: Supported source version 'RELEASE_6' from annotation processor 'org.sonatype.guice.bean.scanners.index.SisuIndexAPT6' less than -source '1.7'
[javac] /home/parallels/software/amazon-kinesis-agent-master/src/com/amazon/kinesis/streaming/agent/Agent.java:400: error: cannot find symbol
[javac] return MoreExecutors.directExecutor();
[javac] ^
[javac] symbol: method directExecutor()
[javac] location: class MoreExecutors
[javac] /home/parallels/software/amazon-kinesis-agent-master/src/com/amazon/kinesis/streaming/agent/config/Configuration.java:125: error: cannot find symbol
[javac] Doubles.stringConverter());
[javac] ^
[javac] symbol: method stringConverter()
[javac] location: class Doubles
[javac] /home/parallels/software/amazon-kinesis-agent-master/src/com/amazon/kinesis/streaming/agent/config/Configuration.java:128: error: cannot find symbol
[javac] Ints.stringConverter());
[javac] ^
[javac] symbol: method stringConverter()
[javac] location: class Ints
[javac] /home/parallels/software/amazon-kinesis-agent-master/src/com/amazon/kinesis/streaming/agent/config/Configuration.java:131: error: cannot find symbol
[javac] Longs.stringConverter());
[javac] ^
[javac] symbol: method stringConverter()
[javac] location: class Longs
[javac] Note: /home/parallels/software/amazon-kinesis-agent-master/src/com/amazon/kinesis/streaming/agent/tailing/FileFlow.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 4 errors
[javac] 1 warning

BUILD FAILED
/home/parallels/software/amazon-kinesis-agent-master/build.xml:38: Compile failed; see the compiler error output for details.

Can someone please look at it.?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.