aws / amazon-cloudwatch-logs-for-fluent-bit Goto Github PK

View Code? Open in Web Editor NEW

170.0 170.0 48.0 310 KB

A Fluent Bit output plugin for CloudWatch Logs

License: Apache License 2.0

Makefile 1.69% Go 96.77% Shell 1.54%

amazon-cloudwatch-logs-for-fluent-bit's People

Contributors

Stargazers

Watchers

Forkers

jeffwan pettitwesley vladlgv localmeasure nithu0115 alexeykaplin ismail774403783 kajeeth jaredcnance hossain-rayhan invitae paigehargrave neuralegion bokun drboyer meghnaprabhu wmz7year zjj2wry zhonghui12 mayankit prashant-mhase drewzhang13 dahu33 jbadru73 qpc-database fala-aws eyal-ha mattn kevinhuang40856 muskanmahajan486 usamj test-mass-forker-org-1 amolsr rawahars wishabi ganesh-764 claych iq-scm faustino3723 seanpm2001 truongnhan0311 platformengineerid altonotch wolfi-chainguard-demo avi-gupta1

amazon-cloudwatch-logs-for-fluent-bit's Issues

Enable tagging created log groups

Log groups can be tagged on creation (see API reference) - this can be helpful for cost allocation or other purposes. It may be convenient if this plugin could tag log groups that are created when the auto_create_group option is set to true.

The one wrinkle with implementing this is that I'm not sure exactly how well the fluent bit config file format works with embedded maps (since we'd likely have a config option like log_group_tags that would contain a map of tag key-values).

[Question] Reducing verbosity of "Sent 1 events to CloudWatch" messages

I have setup a k8s install according to Reducing the Log Volume From Fluent Bit (Optional) of here https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-logs-FluentBit.html

Using the configuration file in the URL: i.e.

 fluent-bit.conf: |
    [SERVICE]
        Flush                     5
        Log_Level                 info
        Daemon                    off
        Parsers_File              parsers.conf
        HTTP_Server               ${HTTP_SERVER}
        HTTP_Listen               0.0.0.0
        HTTP_Port                 ${HTTP_PORT}
        storage.path              /var/fluent-bit/state/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 5M
        
    @INCLUDE application-log.conf
    @INCLUDE dataplane-log.conf
#    @INCLUDE host-log.conf
  
  application-log.conf: |
    [INPUT]
        Name                tail
        Tag                 application.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/*.log
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  container_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}

    [INPUT]
        Name                tail
        Tag                 application.*
        Path                /var/log/containers/cloudwatch-agent*
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  cwagent_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_cwagent.db
        Mem_Buf_Limit       5MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Read_from_Head      ${READ_FROM_HEAD}

    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off

    [OUTPUT]
        Name                cloudwatch_logs
        Match               application.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights

  dataplane-log.conf: |
    [INPUT]
        Name                systemd
        Tag                 dataplane.systemd.*
        Systemd_Filter      _SYSTEMD_UNIT=docker.service
        DB                  /var/fluent-bit/state/systemd.db
        Path                /var/log/journal
        Read_From_Tail      ${READ_FROM_TAIL}

    [INPUT]
        Name                tail
        Tag                 dataplane.tail.*
        Path                /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  container_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_dataplane_tail.db
        Mem_Buf_Limit       50MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}

    [FILTER]
        Name                modify
        Match               dataplane.systemd.*
        Rename              _HOSTNAME                   hostname
        Rename              _SYSTEMD_UNIT               systemd_unit
        Rename              MESSAGE                     message
        Remove_regex        ^((?!hostname|systemd_unit|message).)*$

    [FILTER]
        Name                aws
        Match               dataplane.*
        imds_version        v1

    [OUTPUT]
        Name                cloudwatch_logs
        Match               dataplane.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/dataplane
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights

  parsers.conf: |
    [PARSER]
        Name                docker
        Format              json
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name                syslog
        Format              regex
        Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key            time
        Time_Format         %b %d %H:%M:%S

    [PARSER]
        Name                container_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name                cwagent_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

But I am still getting a lot of messages in cloudwatch like "Sent 1 events to CloudWatch" which are not very useful. Sorry if it is a stupid quesiton, but how do I disable these messages?

Docker image for Fluent Bit 1.3

Hi,

Fluent Bit v1.3 was recently released and I have been looking forwards to making use of one of the features in that version (collectd support). I was just wondering if creating a new release of the docker image was on the roadmap and what the timeline might look like. Thanks.

Old logs and 24h batching limit

I am getting lots of errors on an initial log sync:
time="2019-11-25T11:46:49Z" level=error msg="[cloudwatch 0] InvalidParameterException: The batch of log events in a single PutLogEvents request cannot span more than 24 hours.\n\tstatus code: 400, request id: 9e371dc0-5efe-4803-91d9-e779cb6bc165\n"

Possible solution is an appropriate duration limiter, it can be introduced here:

amazon-cloudwatch-logs-for-fluent-bit/cloudwatch/cloudwatch.go

Line 207 in 83d185c

    
           if len(stream.logEvents) == maximumLogEventsPerPut || (stream.currentByteLength+cloudwatchLen(event)) >= maximumBytesPerPut {

UID

Is this now deprecated in favour of the vanilla CloudWatch plugin?

I see that v1.6 of upstream/vanilla Fluent Bit now includes its own output plugin for writing to CloudWatch, and says this:

This is the documentation for the core Fluent Bit CloudWatch plugin written in C. It can replace the aws/amazon-cloudwatch-logs-for-fluent-bit Golang Fluent Bit plugin released last year.
...
Check the amazon repo for the Golang plugin for details on the deprecation/migration plan for the original plugin.

However, I don't see any corresponding deprecation notice here. So, to clarify:

Is this Amazon-flavour CloudWatch plugin now deprecated?
If not, for new users who want to use Fluent Bit to send logs from EKS to CloudWatch Logs, should we use:
a) The Amazon-flavour Fluent Bit Docker image (containing this Amazon-flavour CloudWatch plugin)
OR
b) The vanilla Fluent Bit Docker image (containing the new vanilla CloudWatch plugin)

Instructions to run on kubernetes

Possible add output examples at https://github.com/fluent/fluent-bit-kubernetes-logging

ECS: Unexpected (unnecessary) enforcement of awsfirelens logdriver usage

I tried spinning up a task and use firelens fluent-bit sidecar. However, I am seeing errors due to which containers are not starting up. I cannot see logs in CloudWatch because fluent-bit container is failing too. To understand what is going on, I changed log-driver to awslogs on all containers, but then I get the following error when creating ECS task-definition.

An error occurred (ClientException) when calling the RegisterTaskDefinition operation: When a firelensConfiguration object is specified, at least one container has to be configured with the awsfirelens log driver.

Is it necessary to enforce this validation? Just curious.

Route the same log-event to multiple outputs (Cloudwatch and other)

I am trying to route my AWS Fargate container app's logs to two outputs - that being in our case Cloudwatch and Datadog. So the same log event in json must be routed at the same time to 2 outputs

The current setup is using ECS fargate containers with firelens enabled for streaming the logs to datadog. The single [output] in this case Datadog is easy. Add the config in your task-definition and routes the logs. The trick comes in when you want to route that same log-event to cloudwatch and datadog

Would we be able to route the same log-file to two different outputs?

Support Cloudwatch Embedded Metric Format Logs

Cloudwatch now supports automatically generating metrics from logs sent in a custom format.

It would be great if this plugin provided support for sending logs to CloudWatch in the expected format.

Would love to see support for built-in fluent-bit plugins (e.g. in_cpu) as well as well as for logs coming from an application with application specific metrics.

Sanitize Log Group and Stream Names

With the new templating feature, in ECS FireLens you might be tempted to do something like:

log_stream_name    something-$(ecs_task_arn)

But this leads to an error:

INFO[0008] [cloudwatch 0] Log group test already exists
ERRO[0008] [cloudwatch 0] InvalidParameterException: 1 validation error detected: Value 'test-arn:aws:ecs:ap-south-1:144718711470:task/737d73bf-8c6e-44f1-aa86-7b3ae3922011' at 'logStreamName' failed to satisfy constraint: Member must satisfy regular expression pattern: [^:*]*

Because the colon character makes it an invalid stream name.

Ideally, the plugin should just strip colons from the names in this case.

ignore sending a log key to cloudwatch

I'm using parser feature of fluent bit for sending my nginx logs to cloudwatch. sometimes nginx container produces some extra logs.

I couldn't find any way to prevent sent these garbage logs to cloudwatch.
It should be better if I can ignore sending a specific key to cloudwatch.

something like this
log_key_ignore log

Defaults are not optimally tuned

We started running the daemonset to our dev workload, we have lesser logs there and within a few minutes of start it started giving exception as

stream processor started
ThrottlingException: Rate exceeded
status code: 400, request id: 3b8372dc-ab59-11e9-bbcf-6bdcac200b1d

our config

fluent-bit.conf: |
    [SERVICE]
        Parsers_File  parsers.conf

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-firehose.conf
  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*_app_*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Annotations         Off
  output-firehose.conf: |
    [OUTPUT]
        Name cloudwatch
        Match **
        region ap-south-1
        log_group_name /logs/application
        log_stream_name raw
        log_key log
        role_arn arn:aws:iam::xx:role/xx-dev-fluentd
        auto_create_group true
  parsers.conf: |
    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

Also couldn't figure out how to make log stream by container name?

Support Input plugin too

Can we support cloudwatch input for fluentbit. We already have something equivalent for fluentd.
With the input plugin we can leverage the fluentbit sql streaming.

[fyi] helm/charts incubator/fluent-bit-aws

Hi, just fyi, I created a helm chart for your fluent-bit plugin.
helm/charts#15830
Would you mind checking the content, documentation and references?

Cheers!

Debug messages make logs useless

This statement should be removed because it shows up so much in the logs that it crowds out other output. I was running fluent bit in kubernetes and this message made it impossible for me to see the actual useful debug logs:

time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:45Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"
time="2021-01-11T02:17:46Z" level=debug msg="[cloudwatch 0] Get ECS Metadata: {Cluster: TaskARN: TaskID:}\n" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:353"

panic: reflect: call of reflect.Value.Index on int64 Value

I get the following panic message on one of our EKS clusters (all the clusters have the same fluent-bit config but only this one is getting a panic). It uses the "compatible" config described in AWS doc which currently uses the amazon/aws-for-fluent-bit:2.10.0 docker image.

panic: reflect: call of reflect.Value.Index on int64 Value

goroutine 17 [running, locked to thread]:
reflect.Value.Index(0x7f4ccb2e5840, 0x1c0008cdad8, 0x86, 0x0, 0x0, 0x7f4ccb366da0, 0x1c001234c90)
	/home/.gimme/versions/go1.13.linux.amd64/src/reflect/value.go:966 +0x1c8
github.com/fluent/fluent-bit-go/output.GetRecord(0x1c0014f0050, 0x1c0011c5c70, 0x1, 0x1c00125c9e0, 0x1c001234b70)
	/home/go/pkg/mod/github.com/fluent/[email protected]/output/decoder.go:80 +0x106
main.FLBPluginFlushCtx(0x7f4cbc86d0e8, 0x7f4cd32050b0, 0x7f4c00068d24, 0x7f4cb76761e0, 0x28)
	/cloudwatch/fluent-bit-cloudwatch.go:174 +0x1e4
main._cgoexpwrap_19a10b653c9e_FLBPluginFlushCtx(0x7f4cbc86d0e8, 0x7f4cd32050b0, 0x68d24, 0x7f4cb76761e0, 0x0)
	_cgo_gotypes.go:88 +0x49

Potentially relates to fluent/fluent-bit-go#34 and fluent/fluent-bit-go#29 which seem to have been fixed over 8 months ago but the fix never made it to this repo.

Process exits immediately when cannot reach CloudWatch logs endpoint.

In the context of service-mesh setups like App Mesh, task or pod network is not ready to send external traffic until Envoy proxy is ready. This results in connect failures. Container launch ordering should probably work (could not find in documentation), but it is not currently possible in Kubernetes environment.

To address, I would recommend this process to retry before giving up.

Intrepolation of the log_group

In may cases the log_group is a factor of what this is logging, eg instances/$SERVICENAME I wonder if there is a way to interpolate that or make log_group get that from a key.

Thanks

Publish built binaries as tagged releases

Thank you for this.

We currently mirror this repository so we can build the plug-in for use in ec2 instances without docker.
For EKS we use the AWS provided docker image.

Would it be possible to do GitHub release of binaries along with the existing docker images?
If there is a public release process that is amenable to a PR am happy to do so.

SIGSEGV in bufferpool

When running the cloudwatch output plugin with fluent bit 1.6.8. It is getting stuck in a crash loop.

This is on a raspberry pi 3 running raspbian. I compiled the cloudwatch plugin from source 1.6.0 tag using go 1.12.

The system logs show this stack trace:

Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: Fluent Bit v1.6.8
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: * Copyright (C) 2019-2020 The Fluent Bit Authors
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: * Copyright (C) 2015-2018 Treasure Data
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: * https://fluentbit.io
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info] Configuration:
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  flush time     | 5.000000 seconds
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  grace          | 5 seconds
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  daemon         | 0
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info] ___________
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  inputs:
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]      systemd
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info] ___________
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  filters:
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]      rewrite_tag.0
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]      record_modifier.1
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]      record_modifier.2
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]      modify.3
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info] ___________
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  outputs:
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]      cloudwatch.0
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info] ___________
Dec 11 09:57:47 logr-edge-7 td-agent-bit[15876]: [2020/12/11 09:57:47] [ info]  collectors:

.... 

Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x751c86d0]
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: goroutine 17 [running, locked to thread]:
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: runtime/internal/atomic.goLoad64(0x4468410c, 0x0, 0x0)
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]:         /usr/local/go/src/runtime/internal/atomic/atomic_arm.go:127 +0x1c
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: github.com/valyala/bytebufferpool.(*Pool).Get(0x44684064, 0x33)
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]:         /home/enabled/.go/pkg/mod/github.com/valyala/[email protected]/pool.go:54 +0x5c
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).setGroupStreamNames(0x44684000, 0x445f6630)
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]:         /home/enabled/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch/cloudwatch.go:503 +0x84
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent(0x44684000, 0x445f6630, 0x75b85c68)
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]:         /home/enabled/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch/cloudwatch.go:356 +0x238
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: main.FLBPluginFlushCtx(0x6449c090, 0x6440c052, 0x1cf, 0x6446b340, 0x75224d50)
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]:         /home/enabled/amazon-cloudwatch-logs-for-fluent-bit/fluent-bit-cloudwatch.go:191 +0x2b8
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]: main._cgoexpwrap_19a10b653c9e_FLBPluginFlushCtx(0x6449c090, 0x6440c052, 0x1cf, 0x6446b340, 0x0)
Dec 11 09:57:52 logr-edge-7 td-agent-bit[15876]:         _cgo_gotypes.go:86 +0x34
Dec 11 09:57:52 logr-edge-7 systemd[1]: td-agent-bit.service: Main process exited, code=killed, status=6/ABRT
Dec 11 09:57:52 logr-edge-7 systemd[1]: td-agent-bit.service: Unit entered failed state.
Dec 11 09:57:52 logr-edge-7 systemd[1]: td-agent-bit.service: Failed with result 'signal'.

Any help as to what could be causing the segfault would be greatly appreciated.

[Question]: Is there a way to configure the log_stream_prefix to rotate hourly?

I was wondering if there is a way to have the cloudwatch log stream within a log group rotate hourly?
For example, if i could define the log_stream_prefix to be service-%MM-%DD-%HH and have the log stream name contain the month, day, year?

The goal is I would like the cloudwatch log group to have a stream per hour.

Thanks.

Plugin causes fluent-bit to stop running in background mode

When running fluent-bit in background mode (i.,e Daemon set to yes in the conf file) and with the output plugin set to cloudwatch, the fluent-bit process seems to be always sleeping. I didn't see this issue when the output plugin was set to, say, the file plugin.

Doing a strace on the process shows it's stuck at:

epoll_pwait(1,

Is this expected, or am I doing something wrong?

Dynamic Coudwatch Log Groups

Other than shipping it as a sidecar inside every application pod, or building a complex filtering configuration with potentially dozens of instances of the cloudwatch logs output plugin that is dependent on informing every node of every possible application that may or may not be scheduled on it in order to decide ahead of time where logs should go, there appears to be no way to use this plugin to route logs for multiple applications to separate cloudwatch log groups based on the applications.

A perfect solution to my problem would be directly configuring the log group based on kubernetes annotations similar to how they have integrated this feature, https://docs.fluentbit.io/manual/pipeline/filters/kubernetes#annotation-examples-in-pod-definition, but failing that it would be good to have some basic functionality to allow construction of a log_group + log_stream as the tag, and define a separating character/string/index and have the logs routed to multiple log groups based on their "log group prefix" so to speak.

[Feature Request] Changing log_retention_days should update the retention policy of the LogGroup

It should be possible to update the logretention period via the plugin. Trying it out introducing the log_retention_days option for an existing created LogGroup didn't changed the retention policy.

Default config and error message invalid time format %d/%b/%Y:%H:%M:%S %z

I'm getting invalid time format error message, but I'm not asking to convert any time, and I don't know where this is coming from. Where is this config coming from? Which field is it trying to parse? How can I change? Thanks a lot.

I'm using the default config from https://github.com/aws/aws-for-fluent-bit/blob/master/configs/parse-json.conf by doing:

        "options":{
           "config-file-type": "file",
           "config-file-value": "/fluent-bit/configs/parse-json.conf"
        }

Error message:

[2020/09/11 10:33:56] [ warn] [parser:json] invalid time format %d/%b/%Y:%H:%M:%S %z for '2020-09-11T10:33:56Z'

Service container definition:

  {
    "name": "${name}",
    "image": "${image}",
    "essential": true,
    "portMappings": [
      {
        "containerPort": ${port},
        "hostPort": ${port}
      }
    ],
    "logConfiguration": {
      "logDriver": "awsfirelens",
      "options": {
          "AWS_Region": "${region}",
          "AWS_Auth": "On",
          "Name": "es",
          "Host": "${es_logs_host}",
          "Port": "443",
          "tls": "On",
          "Index": "logs",
          "Logstash_Format": "On",
          "Logstash_Prefix": "logs"
      }
    }
  }

Firelens container definition:

  {
    "name": "log_router",
    "image": "docker.io/amazon/aws-for-fluent-bit:latest",
    "essential": true,
    "firelensConfiguration": {
        "type": "fluentbit",
        "options":{
           "config-file-type": "file",
           "config-file-value": "/fluent-bit/configs/parse-json.conf"
        }
    },
    "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-create-group": "true",
            "awslogs-group": "${name}_log_router",
            "awslogs-region": "${region}",
            "awslogs-stream-prefix": "ecs"
        }
    },
    "memoryReservation": 50
  }

Fallback log group and stream names

@davidnewhall's awesome new formatting code means you can make the log stream and group names pretty much anything.

But what if something goes wrong?

As shown here, the plugin will make the names using the keys it couldn't find: #16 (comment)

This is probably ok. But the ideal user experience would be to have a fallback log stream or group names that would be used if the necessary keys are not found in the logs.

Output metrics showing wrong output instance name

I am using multiple OUTPUT configuration to cloudwatch.
Then, same metrics for name="cloudwatch.1" appears several times in prometheus output, and it seems metrics name=cloudwatch.2 is some how mislabeld name=cloudwatch.1

Example output is as follows:

...
fluentbit_output_proc_records_total{name="cloudwatch.0"} 4 1569458812274
fluentbit_output_proc_bytes_total{name="cloudwatch.0"} 112 1569458812274
fluentbit_output_errors_total{name="cloudwatch.0"} 0 1569458812274
fluentbit_output_retries_total{name="cloudwatch.0"} 0 1569458812274
fluentbit_output_retries_failed_total{name="cloudwatch.0"} 0 1569458812274
fluentbit_output_proc_records_total{name="cloudwatch.1"} 4 1569458812274
fluentbit_output_proc_bytes_total{name="cloudwatch.1"} 112 1569458812274
fluentbit_output_errors_total{name="cloudwatch.1"} 0 1569458812274
fluentbit_output_retries_total{name="cloudwatch.1"} 0 1569458812274
fluentbit_output_retries_failed_total{name="cloudwatch.1"} 0 1569458812274
fluentbit_output_proc_records_total{name="cloudwatch.1"} 8 1569458812274
fluentbit_output_proc_bytes_total{name="cloudwatch.1"} 224 1569458812274
fluentbit_output_errors_total{name="cloudwatch.1"} 0 1569458812274
fluentbit_output_retries_total{name="cloudwatch.1"} 0 1569458812274
fluentbit_output_retries_failed_total{name="cloudwatch.1"} 0 1569458812274
...

I used following configuration to get the output.

[SERVICE]
    Flush 5
    HTTP_Server   on
    HTTP_Port     2020

[INPUT]
    Name dummy
    Dummy {"message":"dummy-A"}
    Tag input-A

[INPUT]
    Name dummy
    Dummy {"message":"dummy-B"}
    Tag input-B

[OUTPUT]
    Name cloudwatch
    Match input-A
    log_group_name  input-A
    log_stream_name ${HOSTNAME}
    auto_create_group true
    region ${AWS_REGION}

[OUTPUT]
    Name cloudwatch
    Match input-B
    log_group_name  input-B
    log_stream_name ${HOSTNAME}
    auto_create_group true
    region ${AWS_REGION}

[OUTPUT]
    Name cloudwatch
    Match *
    log_group_name  input-AB
    log_stream_name ${HOSTNAME}
    auto_create_group true
    region ${AWS_REGION}

InvalidParameterException: Log event too large

Hi,

we are using this tool in K8s as Daemonset and get the following error sometimes (it depends on which node the pod is running and which other pods are running on this node):

time="2020-08-26T08:24:17Z" level=error msg="[cloudwatch 0] InvalidParameterException: Log event too large: 635616 bytes exceeds limit of 262144\n\tstatus code: 400, request id: 9d79f780-3842-4438-a73d-dc6cc54864c8\n" [2020/08/26 08:24:17] [ warn] [engine] chunk '1-1598430244.412167060.flb' cannot be retried: task_id=2, input=tail .0 > output=cloudwatch.0

So I know that there is this limit in AWS CW: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CalculatePutEventsEntrySize.html

Is it possible to handle this issue (e.g. truncation, ...)?

Best regards,
Albert

how to use role based service accounts

the chart creates a service account, but how does one specify the role arn for that service account? not clear by the docs

aws/eks-charts#395

Not listed under output plugins in fluent-bit documentation

Any reason why this is not listed under fluent-bit output plugins official documentation?

log_key option results in unnecessarily quoted strings

As shown in the screenshot below; the log messages are quoted. This is not a huge problem, but ideally they should not be in quotes.

Cause

The log value is marshaled because its underlying type is unknown: https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/blob/master/cloudwatch/cloudwatch.go#L384

Instead, check if its underlying type is string or []byte. If it is, then convert to string/send the string. Otherwise, marshal to JSON.

Is there a way to specify delay interval ?

Hello Maintainers & contributors,

I have a question

Is there a way to specify delay interval ?

For Example : Instead of Pushing logs realtime . i would like to push logs to cloudwatch every 10 minutes or grater

how to set log stream name as pod_name and container name

Hi
How can i set log_stream_name as <pod_name><container_name><namespace_name> format instead of one we get with prefix like kube.var.log.containers?

Or How can i remove the prefix kube.var.log.containers from the stream name?

[Feature Request] Automatically re-create CloudWatch log groups and log streams if they have been deleted.

FireLens supports automatically creating CloudWatch Logs groups and log stream during start-up. However, if the log groups and streams are deleted afterwards, they will not be recreated by FireLens. All the logs will be lost.

Enable setting retention policies for created log groups

Currently, log groups created by this plugin do not have a retention period set, meaning logs forwarded using this plugin will never expire from CloudWatch. But you may want to set a lower retention period for cost-saving or compliance reasons. I'm proposing adding a new configuration option named like retentionInDays that would be applied to log groups created if auto_create_group is set to true.

I may try to open a PR about this myself - I think I understand where to make the change. But wanted to open a feature request ticket first to facilitate any discussion.

One potential point of discussion - if we add this setting, should we only apply it to created log groups? Or should it be applied to the log group this plugin sends to regardless? I proposed only created groups to avoid this plugin making changes to existing infrastructure.

Support UUID in Log Stream and Group templates

One of the key limitations of CloudWatch is that each logging agent must write logs to a unique log stream- if multiple agents are concurrently writing to a single log stream you will soon get sequence token errors.

In most cases, the tag, or some field in the logs can be used to uniquely identify each instance.

However, there are some niche cases where this is not the case:

Imagine you have thousands of servers, each with applications producing log files with the names application.log and service.log. Also, for the sake of argument, assume you have no access to an instance ID or unique host name in Fluent Bit. There's nothing for each Fluent Bit instance to use to uniquely identify itself and write to a unique log stream
Currently, we do not have an ECS Metadata filter. I have heard of cases where folks want to run Fluent Bit as a side-car and read log files created by their app container. But there is no way to uniquely identify each task and have it send to a separate log stream. This will be solved when we add support for a generic ECS Metadata filter.

Those cases are a bit contrived... but I do think this problem of "nothing uniquely identifies each fluent bit to allow it to create a different log stream" is a real problem. Not a common problem, but something worth solving if there's a simple solution.

One possible solution is to support a new special template, $(uuid) which would add a UUID to your log stream or group.

The Fluentd S3 plugin has this feature: https://github.com/fluent/fluent-plugin-s3

README mentions Firehose

The README mentions connection timeouts to Firehose. Is that a copy/paste error, or is there actually a requirement for Kinesis Firehose?

AWS console missing Options for S3 and path

In order to do advance routing with fluentbit, we need access to the options fields in the task-definition side of things, but not available from the console - see image below for a suggestion

The options I am looking at in the task-definition side see image below

Would be really handy if the aws console provides options for either file, s3, and ability to use custom fluentbit images...

After 1.5.2 the 'auto_create_group' parameter seems to be ignored.

Cross posting this here from this issue in the aws-for-fluent-bit repo. Based on the discussion there this seems to be a regression.

It looks like fluentbit 1.5.6 ignores the auto_create_group parameter for the AWS Cloudwatch output plugin. For comparison, here's a sample logging of aws-for-fluent-bit 2.6.1:

AWS for Fluent Bit Container Image Version 2.6.1
Fluent Bit v1.5.2
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/18 08:13:03] [ info] [engine] started (pid=1)
[2020/09/18 08:13:03] [ info] [storage] version=1.0.4, initializing...
[2020/09/18 08:13:03] [ info] [storage] in-memory
[2020/09/18 08:13:03] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/09/18 08:13:03] [ info] [filter:kubernetes:kubernetes.0] https=1 host=xxx.gr7.eu-west-1.eks.amazonaws.com port=443
[2020/09/18 08:13:03] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2020/09/18 08:13:03] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2020/09/18 08:13:03] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter log_group = '/applications/eks-fluentbit/sandbox-cluster'"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_prefix = 'fluentbit-'"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter log_stream = ''"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter region = 'eu-west-1'"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter log_key = ''"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter role_arn = ''"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_group = 'false'"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter endpoint = ''"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter sts_endpoint = ''"
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter credentials_endpoint = "
time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter log_format = ''"
[2020/09/18 08:13:03] [ info] [sp] stream processor started
[2020/09/18 08:13:04] [ info] inotify_fs_add(): inode=81808779 watch_fd=1 name=/var/log/containers/app-2048-744b65db67-7vp92_default_app-2048-395f15844e966d1df1d574b7684136c8a155b32da5f07d18cc549c19e5874c2f.log
[2020/09/18 08:13:04] [ info] inotify_fs_add(): inode=95421297 watch_fd=2 name=/var/log/containers/alb-ingress-controller-66dfcf4c7b-wk4bx_kube-system_alb-ingress-controller-67c96153d9a7d84e82c5fbaaec336658d29d31791ecc19a217ad70c2e683999a.log
[2020/09/18 08:13:04] [ info] inotify_fs_add(): inode=38798863 watch_fd=3 name=/var/log/containers/aws-node-7vt9x_kube-system_aws-node-14b94d55972a532cb6962c1d3500afce936fa7bf78cac1b05ca56b78a6f9ece1.log
[2020/09/18 08:13:04] [ info] inotify_fs_add(): inode=38799693 watch_fd=4 name=/var/log/containers/coredns-6658f9f447-jftdc_kube-system_coredns-4b27f73b9ba0dae778c3eba0e899c6753bf817cd11b0b98c2fedaa7c1eda43c2.log
[2020/09/18 08:13:04] [ info] inotify_fs_add(): inode=8465476 watch_fd=5 name=/var/log/containers/metrics-server-6bdf64df8c-9qbmq_kube-system_metrics-server-694a23ea90fde6c4f2dfd5eebf1b91beb8aa58340a1f2ca68787345d53bd196b.log
... happy times ...

An here's one for 2.7.0 in our environment:

AWS for Fluent Bit Container Image Version 2.7.0
Fluent Bit v1.5.6
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/18 08:19:11] [ info] [engine] started (pid=1)
[2020/09/18 08:19:11] [ info] [storage] version=1.0.5, initializing...
[2020/09/18 08:19:11] [ info] [storage] in-memory
[2020/09/18 08:19:11] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/09/18 08:19:11] [ info] [filter:kubernetes:kubernetes.0] https=1 host=xxx.gr7.eu-west-1.eks.amazonaws.com port=443
[2020/09/18 08:19:11] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2020/09/18 08:19:11] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2020/09/18 08:19:11] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter log_group = '/applications/eks-fluentbit/sandbox-cluster'"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_prefix = 'fluentbit-'"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter log_stream_name = ''"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter region = 'eu-west-1'"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter log_key = ''"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter role_arn = ''"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter new_log_group_tags = ''"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter log_retention_days = '0'"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter endpoint = ''"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter sts_endpoint = ''"
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter credentials_endpoint = "
time="2020-09-18T08:19:11Z" level=info msg="[cloudwatch 0] plugin parameter log_format = ''"
[2020/09/18 08:19:11] [ info] [sp] stream processor started
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=24126733 watch_fd=1 name=/var/log/containers/app-2048-744b65db67-p9gqg_default_app-2048-c1a892b15f1eec7813e08bb52561895073c6b0706d73f0d5dc6592bdbcbb11f0.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=28402853 watch_fd=2 name=/var/log/containers/ubuntu-toolbox-79bfd58fb8-pwmjs_default_ubuntu-toolbox-940c495f99d2b36b4773dffe912679acb67abc3ee684be6c62ba168e7d8e101e.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=38798125 watch_fd=3 name=/var/log/containers/aws-node-qcq2v_kube-system_aws-node-5e9d32e8025abd0ee567e3cc0259fedbe24eeab16676718d53e71002983ab101.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=14686721 watch_fd=4 name=/var/log/containers/coredns-6658f9f447-wtfcj_kube-system_coredns-384a068a4e42b18a3b8739055507cd4526fe7d69f86b3c1c20a61a5030f633dd.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=263537 watch_fd=5 name=/var/log/containers/vpc-admission-webhook-67646bbf89-xg94g_kube-system_vpc-admission-webhook-a03047e4ff26a078391f4bb3fc76529dd3bcd71341d95c19b3a216a543872f0d.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=68163896 watch_fd=6 name=/var/log/containers/vpc-admission-webhook-deployment-6c4d68f76c-r9x49_kube-system_vpc-admission-webhook-aa186f5137b9b7d6d295710b2e30e29cc2200aeb06b4d1c838c5f883e01e6fa5.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=69262005 watch_fd=7 name=/var/log/containers/vpc-resource-controller-5b5bc46646-vxh4h_kube-system_vpc-resource-controller-b0a9fd2ec356f91674156e1745d681c111fb8bbdf6c1b8a6acbbfb5349680593.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=5255668 watch_fd=8 name=/var/log/containers/kube-proxy-nqc6g_kube-system_kube-proxy-0fbf8e1040048ea863d59cb7b9f74a260d2d36d03b8f6764993b0543c38ec2b8.log
[2020/09/18 08:19:11] [ info] inotify_fs_add(): inode=97625584 watch_fd=9 name=/var/log/containers/fluentbit-f4wls_kube-system_aws-for-fluent-bit-92814b8b79b757088aab1de7842e2e4a82b4a6fea800a06ffb9ff8e68705fab2.log
time="2020-09-18T08:19:17Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: 069f9041-cae2-4d20-8e33-4e5706cfe914"
time="2020-09-18T08:19:17Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: 1103959f-e52e-488f-b547-28a85cd591db"
time="2020-09-18T08:19:17Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: 01c6e885-5ffc-4418-8c87-4f87981217dc"
time="2020-09-18T08:19:17Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: fe14da71-8558-4873-997e-9fa467fa0e92"
time="2020-09-18T08:19:18Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: b6f3cdd8-ef66-4d8d-bdbb-56ff9d854f38"
time="2020-09-18T08:19:18Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: cd0bc34a-e4c8-44a3-ab0c-70e71ec273d6"
time="2020-09-18T08:19:18Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: aac21c1f-e687-4313-93e2-02481ca40919"
time="2020-09-18T08:19:20Z" level=error msg="AccessDeniedException: User: arn:aws:iam::xxx:user/fluentbit-sandbox-clus-eu-west-1 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:xxx:log-group:/applications/eks-fluentbit/sandbox-cluster:log-stream:\n\tstatus code: 400, request id: a679e4b3-ccc5-41bf-b070-918c43193cfe"

Not that in the new version the following line is missing:

time="2020-09-18T08:13:03Z" level=info msg="[cloudwatch 0] plugin parameter auto_create_group = 'false'"

Also note that the log_group already exists (it is provisioned via CloudFormation in our environment), so even if the auto_create_group property is not read, it should not attempt to create the group anyway since it already exists.

One uninterpolated log stream is always created

Using the helm chart from: https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit
Chart version: 0.1.6
amazon-cloudwatch-logs-for-fluent-bit version: 2.7.0

With the values:

  values:
  - firehose:
      enabled: false
    kinesis:
      enabled: false
    elasticsearch:
      enabled: false
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: {{ .Values.role_arn }}
    cloudWatch:
      logGroupName: /aws/eks/cluster/application
      logStreamName: $(kubernetes['namespace_name'])/$(kubernetes['container_name'])
      logStreamPrefix:
      logRetentionDays: 90

I see log streams show up that contain the namespace name/ container name.

But I also see one log stream with the name kubernetes['namespace_name']/kubernetes['container_name']

The log contents look related to fluent bit:

{"log":"[2021/03/20 00:41:17] [ warn] [input] tail.0 paused (mem buf overlimit)\n","stream":"stderr","time":"2021-03-20T00:41:17.799911083Z"}
...

Can this be removed or renamed?

Example for cloudwatch

Example in README is for firehose.

Input plugin 'systemd' cannot be loaded Error: You must specify an output target. Aborting

I am using the below config with image 1.2.2

fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Parsers_File parsers.conf

@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-cloudwatch.conf

input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[INPUT]
Name systemd
Tag host.
Systemd_Filter _SYSTEMD_UNIT=docker.service
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Systemd_Filter _SYSTEMD_UNIT=kubeproxy.service
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-cloudwatch.conf: |
[OUTPUT]
Name cloudwatch
Match *
region ${REGION}
log_group_name /eks/${CLUSTER_NAME}/logs
log_stream_prefix eks
auto_create_group true

My fluent-bit DS are crashing with below error:

Fluent Bit v1.2.2
Copyright (C) Treasure Data

Input plugin 'systemd' cannot be loaded
Error: You must specify an output target. Aborting

Why its not able to detect the output for systemd?

Build issue on Ubuntu 18.04, 64-Bit Linux host

Hi,

Any idea about this build issue?

$ make
PATH=/home/jagan/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin golint ./cloudwatch
mkdir -p ./bin
go build -buildmode c-shared -o ./bin/cloudwatch.so ./
fluent-bit-cloudwatch.go:23:2: cannot find package "github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch" in any of:
/usr/lib/go-1.10/src/github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch (from $GOROOT)
/home/jagan/go/src/github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch (from $GOPATH)
fluent-bit-cloudwatch.go:24:2: cannot find package "github.com/aws/amazon-kinesis-firehose-for-fluent-bit/plugins" in any of:
/usr/lib/go-1.10/src/github.com/aws/amazon-kinesis-firehose-for-fluent-bit/plugins (from $GOROOT)
/home/jagan/go/src/github.com/aws/amazon-kinesis-firehose-for-fluent-bit/plugins (from $GOPATH)
fluent-bit-cloudwatch.go:25:2: cannot find package "github.com/fluent/fluent-bit-go/output" in any of:
/usr/lib/go-1.10/src/github.com/fluent/fluent-bit-go/output (from $GOROOT)
/home/jagan/go/src/github.com/fluent/fluent-bit-go/output (from $GOPATH)
fluent-bit-cloudwatch.go:27:2: cannot find package "github.com/sirupsen/logrus" in any of:
/usr/lib/go-1.10/src/github.com/sirupsen/logrus (from $GOROOT)
/home/jagan/go/src/github.com/sirupsen/logrus (from $GOPATH)
Makefile:26: recipe for target 'bin/cloudwatch.so' failed
make: *** [bin/cloudwatch.so] Error 1

[Question] [Help] - Kubernetes metadata or different log_group for namespace

¿Are you planning to add Kubernetes metadata to the logs?, eg:

[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On

Or how can I collect all the logs of a namespace and send to a specific log_group. Because with the default configuration all the logs go to the same log_group. Default configuration:

[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group true

Without the k8s metadata in the logs and without being able to send the logs of a namespace to a particular log_group I cannot distinguish the different applications.

Plugin sorts message keys alphabetically

Hello all!

When messages are delivered to CW, their order does not match the original order. I see that keys into each message have been sorted alphabetically

2021-04-09T05:57:20.000+03:00
{"az":"us-west-1a","ec2_instance_id":"i-053fe634ac7beb193","log_level":"error","message":"*1088800 recv() failed (104: Co.....}

But the original order of these messages is different

{"log_level":"error","message":"*1088800 recv() failed (104: Co.....", "az":"us-west-1a","ec2_instance_id":"i-053fe634ac7beb193"}

Is this an undocumented feature? Or this behavior issue is relevant to a fluent-bit parser?

Is it possible to disable this behavior? (keys soring)

PS1:
Just have changed the output plugin from Cloudwatch (amazon-cloudwatch-logs-for-fluent-bit) to stdout and got the original order into messages. So, this issue is related to the CW plugin

Multiple OUTPUT configurations override previous ones.

I tried to setup multiple OUTPUTs for different log groups using Tag, but both INPUTs events are pushed to the same log group.

This is my configuration:

[SERVICE]
    Flush 5

[INPUT]
    Name dummy
    Dummy {"message":"dummy-A"}
    Tag input-A

[INPUT]
    Name dummy
    Dummy {"message":"dummy-B"}
    Tag input-B

[OUTPUT]
    Name cloudwatch
    Match input-A
    log_group_name  input-A
    log_stream_name ${HOSTNAME}
    auto_create_group true
    region ${AWS_REGION}

[OUTPUT]
    Name cloudwatch
    Match input-B
    log_group_name  input-B
    log_stream_name ${HOSTNAME}
    auto_create_group true
    region ${AWS_REGION}

Log group input-A is empty.

Log group input-B got the both INPUTs:

07:41:28 {"message":"dummy-A"}
07:41:28 {"message":"dummy-B"}
07:41:29 {"message":"dummy-A"}
07:41:29 {"message":"dummy-B"}
07:41:30 {"message":"dummy-A"} 
07:41:30 {"message":"dummy-B"}
07:41:31 {"message":"dummy-A"}
07:41:31 {"message":"dummy-B"}
...

[Feature Request] Allow to create log-groups with kms encryption

At the moment I can't find an option for creating kms encrypted log groups, but I think it would be an excellent feature.

It should be possible via golang to create an encrypted log group: https://docs.aws.amazon.com/sdk-for-go/api/service/cloudwatchlogs/#CreateLogGroupInput

Feature Request: Support variables in fallback log group/stream names as well

Rationale
I know that default_log_group_name and default_log_stream_name were introduced specifically as fallbacks in case the variables in log_group_name and log_stream_name failed to parse, so I can understand the logic behind having the fallbacks just be "dumb strings", without any variable/templating support.

However, I think there is a benefit to having even the fallbacks support variables -- for example:

My first choice of log stream name is a dynamic variable that is NOT guaranteed to exist: optional-var
My fallback/second choice log stream name is ALSO a dynamic variable, but one that IS guaranteed to exist: Fluent Bit tag

Both these dynamic log stream names are preferable to a fixed, static name, which is currently my only fallback option.

[OUTPUT]
    Name                    cloudwatch
    Match                   *
    region                  us-east-1
    log_group_name          test-log-group
    auto_create_group       true
    log_stream_name         $(optional-var)
    default_log_stream_name $(tag)

Currently, in this case, when my primary variable (optional-var) doesn't exist, Fluent Bit literally creates a log stream named $(tag), instead of substituting in the Fluent Bit tag.

Edge Case
Quick note on the obvious edge case here -- i.e. if the variable mentioned in the fallback/default log stream name also fails.
At this point, given that the user has now had two chances to choose a variable that will always exist, I think it's reasonable to just throw a hard error and fail, rather than attempting to automatically recover somehow (or, god forbid, creating a new default_default_log_stream_name option), but that's just my suggestion.

Input plugin 'systemd' cannot be loaded

When I add a systemd input to my fluent bit config using the amazon/aws-for-fluent-bit:1.2.0 docker image inside a kubernetes cluster I get the following error:

Input plugin 'systemd' cannot be loaded

Here's a configuration file that reproduces the problem:

[SERVICE]
    Flush         1
    Log_Level     info
    Parsers_File  parsers.conf

[INPUT]
    Name            systemd
    Tag             host.*
    Systemd_Filter  _SYSTEMD_UNIT=docker.service
    Path            /var/log/journal

[OUTPUT]
    Name   stdout
    Match  *

In order to fix the problem I rebuilt the docker image using this docker file and everything seems to be working fine now:

FROM golang:1.12 as go-build
RUN go get github.com/aws/amazon-kinesis-firehose-for-fluent-bit
WORKDIR /go/src/github.com/aws/amazon-kinesis-firehose-for-fluent-bit
RUN make release
RUN go get github.com/aws/amazon-cloudwatch-logs-for-fluent-bit
WORKDIR /go/src/github.com/aws/amazon-cloudwatch-logs-for-fluent-bit
RUN make release

FROM fluent/fluent-bit:1.2.2
COPY --from=go-build /go/src/github.com/aws/amazon-kinesis-firehose-for-fluent-bit/bin/firehose.so /fluent-bit/firehose.so
COPY --from=go-build /go/src/github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/bin/cloudwatch.so /fluent-bit/cloudwatch.so

# Optional Metrics endpoint
EXPOSE 2020

# Entry point
CMD ["/fluent-bit/bin/fluent-bit", "-e", "/fluent-bit/firehose.so", "-e", "/fluent-bit/cloudwatch.so", "-c", "/fluent-bit/etc/fluent-bit.conf"]