sumologic / fluentd-kubernetes-sumologic Goto Github PK
View Code? Open in Web Editor NEWFluentD plugin to extract logs from Kubernetes clusters, enrich and ship to Sumo logic.
License: Apache License 2.0
FluentD plugin to extract logs from Kubernetes clusters, enrich and ship to Sumo logic.
License: Apache License 2.0
We run this collector on a cluster generating ~5 millions log messages per day.
We recently encountered stability issues, so we are experimenting with the parameters (in particular the number of threads). However, each time we restart the pods to try a new param set:
The resulting collection usage is the following:
Small bars are our real log production rate (around 1k messages per minute).
This behaviour is highly unstable.
The sumo output expects json_merge
but these parameters are sharing an environment variable.
Right now it appears that there is only a "latest" tag: https://hub.docker.com/r/sumologic/fluentd-kubernetes-sumologic/tags/
It would be helpful to have a versioned tag pushed as well.
Similar to EXCLUDE_PATH, would like the reverse ability to set the INCLUDE_PATH.
Problem is the in_tail plugin can only read one log at a time. When there are multiple logs on the same host, we need to run multiple copies of the fluentd-kubernetes-sumologic container. Each one is dedicated to a certain log. Be nice to just pass a parameter to specify each log.
I believe it currently is set as "path /mnt/log/containers/*.log" in https://github.com/SumoLogic/fluentd-kubernetes-sumologic/blob/master/conf.d/file/source.containers.conf.
We are deploying fluentd-kubernetes-sumologic using the helm chart to a standard kops cluster. The documentation indicates that kubelet logs should be captured by default and can be filtered on using _sourceName='Http Input'
When filtering on _sourceName='Http Input'
the only logs we see are of those of the fluentd-kubernetes-sumologic
containers.
Hello,
I am attempting to use the new multi-line feature without much success.
I set this environment var:
- name: MULTILINE_START_REGEXP
value: \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d+
So I would expect any lines following such a line, that do not start with a formatted date, to be included in the previous message. But still, each "newline" seems to create a separate log entry in sumo, even if it did not start with a date string.
Here is a sample message that should match:
2017-08-09 17:11:58.522 [INFO] [De-Obfuscator] \stacktrace\:\Mon May 16 10:28:02 GMT-700 2016 MobClient
So I expect until we see another date like that, all following messages should be part of this message. But they are not, they are each sent as unique log entries, for example, here is the next message, which did not get grouped with the above:
SEVERE: Possible problem with your *.gwt.xml module file.
Am I doing something wrong? Is it a bug in the handling of multiline messages?
This is tough to get a good read on, but it looks like after being 429 throttled, our sumo is only receiving about 1/20th of our logs that are being ingested by fluent until the sumologic pod is restarted.
I'm running kubernetes on CoreOS, none of the paths in sources.kubernetes
or sources.docker
exist on my system. I've been told that there's a systemd plugin for fluent. How can this be folded into the current image? Should we create a new env var, should it try to be smart?
It appears that the the positional files are mapped to an empty directory even though there is a volume mounted for them. This would suggest that every time the fluend pod gets recreated it would start from the beginning of available logs. We had this happen and it uploaded 300gb of duplicated logs as fast as fluent could send them.
Mapping
volumes:
- name: pos-files
emptyDir: {}
Is there any reason not to map this into the node's filesystem?
Here's my sumologic output config:
<match **>
type sumologic
log_key log
endpoint https://endpoint1.collection.us2.sumologic.com/receiver/v1/http/<not showing this>
verify_ssl
log_format json
flush_interval 30s
num_threads 1
</match>
And here's the error I'm getting:
temporarily failed to flush the buffer. next_retry=2017-02-07 14:11:19 +0000 error_class="SocketError" error="Failed to open TCP connection to endpoint1.collection.us2.sumologic.com:443 (getaddrinfo: Name does not resolve)" plugin_id="object:3fa92374c7f8"
2/7/2017 7:07:08 AM 2017-02-07 14:07:08 +0000 [warn]: suppressed same stacktrace
Does anyone at sumologic have any interest in participating as a reviewer, for the new sumologic-fluentd chart? Having someone from sumo able to review changes would help keep the chart relevant.
The current DaemonSet spec doesn't work with RBAC enabled, which is default for k8s 1.6+. A ServiceAccount definition is needed to correct this.
To replicate:
spawn command to main: cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/usr/local/bin/fluentd", "-c", "/fluentd/etc/fluent.file.conf", "-p", "/fluentd/plugins", "--under-supervisor"] Unexpected error Exception encountered fetching metadata from Kubernetes API endpoint: 403 Forbidden (User "system:serviceaccount:default:default" cannot list pods at the cluster scope.) /var/lib/gems/2.3.0/gems/fluent-plugin-kubernetes_metadata_filter-0.28.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:383:in rescue in start_watch' /var/lib/gems/2.3.0/gems/fluent-plugin-kubernetes_metadata_filter-0.28.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:374:in start_watch' /var/lib/gems/2.3.0/gems/fluent-plugin-kubernetes_metadata_filter-0.28.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:200:in block in configure'
fluentd sumologic plugin support proxy but fluentd-kubernetes-sumologic daemonset does not support the proxy. please add support for proxy variable.
"The install guide ( https://github.com/SumoLogic/fluentd-kubernetes-sumologic ) does not mention supporting proxy settings but the Sumologic plugin ( https://github.com/SumoLogic/fluentd-output-sumologic ) for kubernetes does"
The ClusterRole config at daemonset/rbac/fluentd.yaml currently has pods as the only resource.
This causes containers to fail on startup saying they cannot list namespaces at the cluster scope:
2018-04-06 19:12:37 +0000 [info]: reading config file path="/fluentd/etc/fluent.file.conf" 2018-04-06 19:12:38 +0000 [error]: config error file="/fluentd/etc/fluent.file.conf" error_class=Fluent::ConfigError error="start_namespace_watch: Exception encountered setting up namespace watch from Kubernetes API v1 endpoint https://100.128.0.1:443/api: namespaces is forbidden: User \"system:serviceaccount:default:fluentd\" cannot list namespaces at the cluster scope ({\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"namespaces is forbidden: User \\\"system:serviceaccount:default:fluentd\\\" cannot list namespaces at the cluster scope\",\"reason\":\"Forbidden\",\"details\":{\"kind\":\"namespaces\"},\"code\":403}\n)"
This config needs to add namespaces
to the list of resources in the ClusterRole config.
I'm happy to make a PR if needed.
Hopefully last road block. For some reason the mounted container logs are not readable by the fluentd container.
/mnt/log/containers/fluentd-48fc6_default_fluentd-736a8af93bb05b2d247c994ea6177952e16e14137bd01f03e682a465a9725d2d.log unreadable. It is excluded and would be examined next time.
I've tried running as privileged, but not luck. Perhaps you've hit this before and know a solution?
I have an app/container that uses a GPU for machine learning processes running on kubernetes in GCP. One cool feature about GPU node pools in GKE is that pods will only be placed on the node pool if the pod is requesting GPU resources. This is good in most scenarios, but in the case of this Sumo plugin, it means logs from pods on the GPU node pool never gets collected since the Sumo pod won't ever be scheduled on those nodes.
https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
This causes only Pods requesting GPUs to be scheduled on GPU nodes, which enables more efficient autoscaling: your GPU nodes can quickly scale down if there are not enough Pods requesting GPUs.
I would love to see support added for scheduling the Sumo Flutentd pods on GPU node pools running in GCP.
I installed the fluentd logger unfortunately it only seems to be sending junk data. Sample message below.
{"timestamp":1486707244000,"error_class":"KeyError","error":"key not found: "kubernetes"","tag":"kube-apiserver","time":1486707243,"record":{"severity":"I","pid":"5","source":"panics.go:76","message":"GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (992.752µs) 200 [[kube-controller-manager/v1.5.2 (linux/amd64) kubernetes/08e0995/leader-election] 127.0.0.1:11711]","_sumo_metadata":{"log_format":"json","source":"k8s_kube-apiserver","category":"kubernetes/kube-apiserver"}},"message":"dump an error event: error_class=KeyError error="key not found: \"kubernetes\"" tag="kube-apiserver" time=1486707243 record={"severity"=>"I", "pid"=>"5", "source"=>"panics.go:76", "message"=>"GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager: (992.752\xC2\xB5s) 200 [[kube-controller-manager/v1.5.2 (linux/amd64) kubernetes/08e0995/leader-election] 127.0.0.1:11711]", "_sumo_metadata"=>{:log_format=>"json", :source=>"k8s_kube-apiserver", :category=>"kubernetes/kube-apiserver"}}"}
Can you tell me what might be causing these messages to be printed into Sumo? It looks like these errors from fluentd itself.
Even with READ_FROM_HEAD=false, shouldn't fluentd still read all lines (even the header) of logs created after the start of fluentd? For example, fluentd is fully up and running. Well afterwards, a new application container is started, which creates a new log:
The problem I'm seeing is fluentd is not forwarding the first few lines of the log (lines 1-33 in the above example). The rest of the log (lines 34 -100) are forwarded normally.
kubernetes_sumologic plugin always adds a timestamp field to json logs. This can lead to invalid json when the application output already includes a timestamp
field.
Application outputs a log message like:
{"timestamp":"2018-05-02T16:06:52.550+0000","severity":"info","message":"hello world"}
The ingested message will become (with json_merge
):
{"time":1525276647680,"timestamp": "2018-05-02T15:57:27.680+0000", "timestamp":"2018-05-02T16:06:52.550+0000","severity":"info","message":"hello world"}
Possibilities:
timestamp
key do not add one.I want the ability to "T" (data forward) my data to sumo logic and another source like s3 at the fluentD plugin level. The use case here is if I want to filter certain data that I send to Sumo Logic, but I want a back up of all my data I need this ability.
We noticed that first ~50 seconds of logs after a container startup are not captured.
For example:
12:50:00
2017-02-15 12:50:42 +0000 [info]: following tail of /mnt/log/containers/my-pod-name-87d9f1b535e11c02690567e7b4bb65016346c263b683c96f720b90e52ca6982e.log
12:50:00
and 12:50:42
are lost.This is a showstopper for us. Not only we are loosing logs, but we are loosing the most important log entries (application startup). Crashing application are not showing any log!
Maybe I'm doing this wrong...
I tried to apply LOG_FORMAT=text
in the daemonset sample definintion but the logs are still sent in JSON.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
labels:
app: fluentd
version: v1
spec:
template:
metadata:
labels:
name: fluentd
spec:
volumes:
- name: pos-files
emptyDir: {}
- name: host-logs
hostPath:
path: /var/log/
- name: docker-logs
hostPath:
path: /var/lib/docker
containers:
- image: sumologic/fluentd-kubernetes-sumologic:latest
name: fluentd
imagePullPolicy: Always
volumeMounts:
- name: host-logs
mountPath: /mnt/log/
readOnly: true
- name: docker-logs
mountPath: /var/lib/docker/
readOnly: true
- name: pos-files
mountPath: /mnt/pos/
env:
- name: LOG_FORMAT
value: text
- name: COLLECTOR_URL
valueFrom:
secretKeyRef:
name: sumologic
key: collector-url
We are regularly seeing the sumo fluentd container take somewhere between 500m and 1100m when deployed in our cluster.
The sumo collector agent aims for a memory usage of 128m~ and we have never had any problems with it, obviously this container is different as it uses fluentd, ruby and an http collector.
Are we expecting such a significant increase in memory? We have only been testing with it for a few weeks, but it seems like there is a process keep much more in memory than it should be, all its doing is streaming logs from systemd and pushing to HTTP, there should be no reason to allocate >1gb?
I used Rancher to setup my Kubernetes cluster. Not sure how the rancher config changed the default k8s configuration, but the logs are not where this plugin expects them to be.
e.g.
The fluentd source is configured to tail containers like this:
<source>
type tail
format json
time_key time
path /mnt/log/containers/*.log
pos_file /mnt/pos/ggcp-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag containers.*
read_from_head false
</source>
But there is no containers
folder in /mnt/log
.
Here is a list of all the containers
folders on my host:
ubuntu@ip-172-31-25-18:~$ sudo find / -name "containers"
/var/lib/docker/containers
/var/lib/docker/aufs/diff/2fa0c69c939c136de5ca69d79d6e852d5f7b04ff4f52ef1c354b6952662f0d1b/var/log/containers
/var/lib/docker/aufs/mnt/2fa0c69c939c136de5ca69d79d6e852d5f7b04ff4f52ef1c354b6952662f0d1b/var/log/containers
/var/lib/kubelet/pods/fc1f8008-eaf1-11e6-baef-5ed5f3051185/containers
/var/lib/kubelet/pods/3b7825b5-eaf2-11e6-baef-5ed5f3051185/containers
And the only one of those that seems to match the regex for the containers filter is:
/var/lib/docker/aufs/mnt/2fa0c69c939c136de5ca69d79d6e852d5f7b04ff4f52ef1c354b6952662f0d1b/var/log/containers
Not sure how to proceed from here... I'm new to fluentd, so doing more research.
The docker config and kubernetes configs don't work either. On my setup none of these exist:
/mnt/log/salt/minion
/mnt/log/docker.log
Greetings,
Related to #12,
Thanks for adding the feature @frankreno, I think I am missing something really silly here but anyway, I was trying to ignore multiple pods logs to be dropped before being shipped to sumologic and was trying to use EXCLUDE_POD_REGEX
Was trying out the approach of
...
- name: EXCLUDE_POD_REGEX
value: "["calico-node-*", "kube2iam-*"]"
...
to which error: error converting YAML to JSON: yaml: line 53: did not find expected key
also the approach didn't work
...
- name: EXCLUDE_POD_REGEX
value: "calico-node-*"
- name: EXCLUDE_POD_REGEX
value: "kube2iam-*"
...
but after applying the change, checking the manifest from the api-server it was just reflecting the older EXCLUDE_POD_REGEX
pod regex value.
How would one go about including multiple pods to be ignored in the manifest?
seems like on newer version of kubernetes logs are located on /var/log/containers but they are only symbolic links to /var/log/pods, so this is causing error reading, maybe we should change the default mounting point to /var/log instead of /mnt/log
Greetings,
We noticed that the logs being ingested by the following source config, which I have added in a new file fluentd-kubernetes-sumologic/conf.d/file/source.trace.conf
as seen here in our fork https://github.com/razorpay/fluentd-kubernetes-sumologic/blob/740c29ada9778e9107a124ae4a1e2548dd2f8da7/conf.d/file/source.trace.conf out of which we have built a custom container image being referenced in our deamonset yaml
<source>
@type tail
format json
time_key "#{ENV['TIME_KEY']}"
path "#{ENV['TRACE_LOGS_PATH']}"
exclude_path "#{ENV['TRACE_EXCLUDE_PATH']}"
pos_file /mnt/pos/trace.log.pos
tag trace
format /(?<message>.*)/
</source>
<match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint "#{ENV['COLLECTOR_URL']}"
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
</match>
Our fluentd daemonset yaml which has been applied in our k8s cluster
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: fluentd-sumologic
version: v1
name: fluentd-sumologic
spec:
selector:
matchLabels:
k8s-app: fluentd-sumologic
template:
metadata:
labels:
k8s-app: fluentd-sumologic
spec:
containers:
- env:
- name: COLLECTOR_URL
valueFrom:
secretKeyRef:
key: collector-url
name: sumologic
- name: FLUSH_INTERVAL
value: 1s
- name: TRACE_LOGS_PATH
value: <path to our log file(s)>/*.log
- name: EXCLUDE_POD_REGEX
value: (kube-scheduler-*|kube-apiserver-*)
image: <private-container-image-registry>:fluentd-sumologic
imagePullPolicy: Always
name: fluentd
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/log/
name: host-logs
readOnly: true
- mountPath: /var/log/
name: host-logs
readOnly: true
- mountPath: /var/lib/docker/containers
name: docker-logs
readOnly: true
- mountPath: /mnt/pos/
name: pos-files
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: fluentd
serviceAccountName: fluentd
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: pos-files
- hostPath:
path: /var/log/
type: ""
name: host-logs
- hostPath:
path: /var/lib/docker/containers
type: ""
name: docker-logs
templateGeneration: 7
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
$ kubectl logs fluentd-sumologic-zwxs2 -n logging
2018-07-10 09:02:50 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.file.conf"
2018-07-10 09:02:51 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2018-07-10 09:02:51 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2018-07-10 09:02:51 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2018-07-10 09:02:51 +0000 [info]: using configuration file: <ROOT>
<match containers.**.fluentd**>
@type null
</match>
<source>
@type monitor_agent
bind "0.0.0.0"
port 24220
</source>
<source>
@type tail
format json
time_key time
path "/mnt/log/containers/*.log"
exclude_path
pos_file "/mnt/pos/ggcp-containers.log.pos"
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag "containers.*"
read_from_head true
enable_stat_watcher true
<parse>
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
@type json
time_type string
</parse>
</source>
<filter containers.**>
@type concat
key "log"
multiline_start_regexp "/^\\w{3} \\d{1,2}, \\d{4}/"
separator ""
timeout_label "@NORMAL"
</filter>
<match containers.**>
@type relabel
@label @NORMAL
</match>
<label @NORMAL>
<filter containers.**>
@type kubernetes_metadata
annotation_match ["sumologic.com.*"]
de_dot false
tag_to_kubernetes_name_regexp ".+?\\.containers\\.(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\\.log$"
merge_json_log false
</filter>
<filter containers.**>
@type kubernetes_sumologic
source_name "%{namespace}.%{pod}.%{container}"
source_host ""
log_format "json"
kubernetes_meta true
source_category "%{namespace}/%{pod_name}"
source_category_prefix "kubernetes/"
source_category_replace_dash "/"
exclude_namespace_regex ""
exclude_pod_regex "(sqs-autoscaler-*|fluentd-*|node-exporter-*|kube-scheduler-*|kube-proxy-*|kube-dns-*|kube-controller-manager-*|kube-apiserver-*|calico-node-*|kube2iam-*|kube-state-metrics-*|traefik-*|kiam-*|kube-flannel-*|container-linux-update-*|weave-scope-*|pod-checkpointer-*|sentry-*|kibana-harvester-*|corendns*)"
exclude_container_regex ""
exclude_host_regex ""
</filter>
<match **>
@type sumologic
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 1s
num_threads 1
open_timeout 60
add_timestamp true
proxy_uri ""
<buffer>
flush_thread_count 1
flush_interval 1s
</buffer>
</match>
</label>
<source>
@type tail
format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
time_format %Y-%m-%dT%H:%M:%S.%NZ
path "/var/lib/docker.log"
exclude_path
pos_file "/mnt/pos/ggcp-docker.log.pos"
tag "docker"
enable_stat_watcher true
<parse>
time_format %Y-%m-%dT%H:%M:%S.%NZ
@type regexp
expression ^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?
</parse>
</source>
<filter docker.**>
@type kubernetes_sumologic
source_category "docker"
source_name "k8s_docker"
source_category_prefix "kubernetes/"
</filter>
<source>
@type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path "/mnt/log/salt/minion"
exclude_path
pos_file "/mnt/pos/ggcp-salt.pos"
tag "salt"
enable_stat_watcher true
<parse>
time_format %Y-%m-%d %H:%M:%S
@type regexp
expression ^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$
</parse>
</source>
<filter salt.**>
@type kubernetes_sumologic
source_category "salt"
source_name "k8s_salt"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format syslog
path "/mnt/log/startupscript.log"
exclude_path
pos_file "/mnt/pos/ggcp-startupscript.log.pos"
tag "startupscript"
enable_stat_watcher true
<parse>
@type syslog
</parse>
</source>
<filter startupscript.**>
@type kubernetes_sumologic
source_category "startupscript"
source_name "k8s_startupscript"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path "/mnt/log/kubelet.log"
exclude_path
pos_file "/mnt/pos/ggcp-kubelet.log.pos"
tag "kubelet"
enable_stat_watcher true
<parse>
time_format %m%d %H:%M:%S.%N
format_firstline /^\w\d{4}/
@type multiline
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
</parse>
</source>
<filter kubelet.**>
@type kubernetes_sumologic
source_category "kubelet"
source_name "k8s_kubelet"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path "/mnt/log/kube-apiserver.log"
exclude_path
pos_file "/mnt/pos/ggcp-kube-apiserver.log.pos"
tag "kube-apiserver"
enable_stat_watcher true
<parse>
time_format %m%d %H:%M:%S.%N
format_firstline /^\w\d{4}/
@type multiline
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
</parse>
</source>
<filter kube-apiserver.**>
@type kubernetes_sumologic
source_category "kube-apiserver"
source_name "k8s_kube-apiserver"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format json
time_key timestamp
time_format %Y-%m-%dT%H:%M:%SZ
path "/mnt/log/kube-apiserver-audit.log"
exclude_path
pos_file "/mnt/pos/ggcp-kube-audit.log.pos"
tag "kube-audit"
read_from_head true
enable_stat_watcher true
<parse>
time_key timestamp
time_format %Y-%m-%dT%H:%M:%SZ
@type json
time_type string
</parse>
</source>
<filter kube-audit.**>
@type kubernetes_sumologic
source_category "kube-audit"
source_name "k8s_kube-audit"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path "/mnt/log/kube-controller-manager.log"
exclude_path
pos_file "/mnt/pos/ggcp-kube-controller-manager.log.pos"
tag "kube-controller-manager"
enable_stat_watcher true
<parse>
time_format %m%d %H:%M:%S.%N
format_firstline /^\w\d{4}/
@type multiline
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
</parse>
</source>
<filter kube-controller-manager.**>
@type kubernetes_sumologic
source_category "kube-controller-manager"
source_name "k8s_kube-controller-manager"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path "/mnt/log/kube-scheduler.log"
exclude_path
pos_file "/mnt/pos/ggcp-kube-scheduler.log.pos"
tag "kube-scheduler"
enable_stat_watcher true
<parse>
time_format %m%d %H:%M:%S.%N
format_firstline /^\w\d{4}/
@type multiline
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
</parse>
</source>
<filter kube-scheduler.**>
@type kubernetes_sumologic
source_category "kube-scheduler"
source_name "k8s_kube-scheduler"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path "/mnt/log/glbc.log"
exclude_path
pos_file "/mnt/pos/ggcp-glbc.log.pos"
tag "glbc"
enable_stat_watcher true
<parse>
time_format %m%d %H:%M:%S.%N
format_firstline /^\w\d{4}/
@type multiline
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
</parse>
</source>
<filter glbc.**>
@type kubernetes_sumologic
source_category "glbc"
source_name "k8s_glbc"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format multiline
multiline_flush_interval 5s
format_firstline /^\w\d{4}/
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
time_format %m%d %H:%M:%S.%N
path "/mnt/log/cluster-autoscaler.log"
exclude_path
pos_file "/mnt/pos/ggcp-cluster-autoscaler.log.pos"
tag "cluster-autoscaler"
enable_stat_watcher true
<parse>
time_format %m%d %H:%M:%S.%N
format_firstline /^\w\d{4}/
@type multiline
format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
</parse>
</source>
<filter cluster-autoscaler.**>
@type kubernetes_sumologic
source_category "cluster-autoscaler"
source_name "k8s_cluster-autoscaler"
source_category_prefix "kubernetes/"
exclude_namespace_regex ""
</filter>
<source>
@type tail
format /(?<message>.*)/
time_key time
path "<path to our log file(s)>/*.log"
exclude_path
pos_file "/mnt/pos/trace.log.pos"
tag "trace"
<parse>
time_key time
@type regexp
expression (?<message>.*)
</parse>
</source>
<match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match>
<match **>
@type sumologic
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 1s
num_threads 1
open_timeout 60
add_timestamp true
proxy_uri ""
<buffer>
flush_thread_count 1
flush_interval 1s
</buffer>
</match>
</ROOT>
2018-07-10 09:02:51 +0000 [info]: starting fluentd-1.1.3 pid=1 ruby="2.3.3"
2018-07-10 09:02:51 +0000 [info]: spawn command to main: cmdline=["/usr/bin/ruby2.3", "-Eascii-8bit:ascii-8bit", "/usr/local/bin/fluentd", "-c", "/fluentd/etc/fluent.file.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
2018-07-10 09:02:51 +0000 [info]: gem 'fluent-plugin-concat' version '2.2.1'
2018-07-10 09:02:51 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '1.0.2'
2018-07-10 09:02:51 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.9.1'
2018-07-10 09:02:51 +0000 [info]: gem 'fluent-plugin-sumologic_output' version '1.0.3'
2018-07-10 09:02:51 +0000 [info]: gem 'fluent-plugin-systemd' version '0.3.1'
2018-07-10 09:02:51 +0000 [info]: gem 'fluentd' version '1.1.3'
2018-07-10 09:02:51 +0000 [info]: adding filter in @NORMAL pattern="containers.**" type="kubernetes_metadata"
2018-07-10 09:02:51 +0000 [info]: adding filter in @NORMAL pattern="containers.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding match in @NORMAL pattern="**" type="sumologic"
2018-07-10 09:02:51 +0000 [info]: #0 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2018-07-10 09:02:51 +0000 [info]: adding match pattern="containers.**.fluentd**" type="null"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="containers.**" type="concat"
2018-07-10 09:02:51 +0000 [info]: adding match pattern="containers.**" type="relabel"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="docker.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="salt.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="startupscript.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="kubelet.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="kube-apiserver.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="kube-audit.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="kube-controller-manager.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="kube-scheduler.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="glbc.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding filter pattern="cluster-autoscaler.**" type="kubernetes_sumologic"
2018-07-10 09:02:51 +0000 [info]: adding match pattern="trace" type="sumologic"
2018-07-10 09:02:51 +0000 [info]: #0 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2018-07-10 09:02:51 +0000 [info]: adding match pattern="**" type="sumologic"
2018-07-10 09:02:51 +0000 [info]: #0 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2018-07-10 09:02:51 +0000 [info]: adding source type="monitor_agent"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [info]: adding source type="tail"
2018-07-10 09:02:51 +0000 [warn]: parameter 'index' in <match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match> is not used.
2018-07-10 09:02:51 +0000 [warn]: parameter 'check_index' in <match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match> is not used.
2018-07-10 09:02:51 +0000 [warn]: parameter 'source' in <match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match> is not used.
2018-07-10 09:02:51 +0000 [warn]: parameter 'sourcetype' in <match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match> is not used.
2018-07-10 09:02:51 +0000 [warn]: parameter 'format' in <match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match> is not used.
2018-07-10 09:02:51 +0000 [warn]: parameter 'time_format' in <match trace>
@type sumologic
index fluentd
check_index false
log_key "log"
endpoint <our-endpoint>
verify_ssl
log_format "json"
flush_interval 2s
num_threads 1
open_timeout 60
add_timestamp true
buffer_type memory
buffer_queue_limit 16
buffer_chunk_limit 8m
source trace
sourcetype fluentd
format json
time_format localtime
<buffer>
@type memory
flush_thread_count 1
flush_interval 2s
chunk_limit_size 8m
queue_limit_length 16
</buffer>
</match> is not used.
2018-07-10 09:02:51 +0000 [info]: #0 starting fluentd worker pid=16 ppid=1 worker=0
2018-07-10 09:02:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-gr2nc-trace-2018-07-10.log
2018-07-10 09:02:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-zs75p-trace-2018-07-10.log
2018-07-10 09:02:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-gr2nc-trace-2018-07-09.log
2018-07-10 09:02:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-cc67d-trace-2018-07-10.log
2018-07-10 09:02:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-kpxlt-trace-2018-07-09.log
2018-07-10 09:02:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-sqpxj-trace-2018-07-09.log
2018-07-10 09:02:52 +0000 [warn]: #0 /mnt/log/kube-apiserver-audit.log not found. Continuing without tailing it.
2018-07-10 09:02:52 +0000 [info]: #0 following tail of /mnt/log/containers/kube-router-n2qn4_kube-system_kube-router-d4f6464225e3f54f8b7c06a0ed8f321504db6e26aea9f8fdc68f8af54bb800cc.log
2018-07-10 09:02:52 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-10 09:02:52 +0000 [info]: #0 following tail of /mnt/log/containers/node-exporter-fs8gf_monitoring_node-exporter-87d69a9d209d99a232601023298daabb5c99e8b1e82df18de29346ebce333066.log
2018-07-10 09:02:52 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-10 09:02:52 +0000 [info]: #0 following tail of /mnt/log/containers/kiam-agent-fdfvv_kube-system_kiam-agent-7079e6ebc96633f30b731b14a8011f8b3f78eea2159e23b25c8a87a8e09f9b1f.log
2018-07-10 09:02:52 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-10 09:02:57 +0000 [info]: #0 following tail of /mnt/log/containers/api-web-cc67d_api_api-a39e1242dc7ad76a89ed644f0ceb54466fd59ff7a5d12555b291ec958f47b581.log
2018-07-10 09:02:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-10 09:02:57 +0000 [info]: #0 following tail of /mnt/log/containers/fluentd-sumologic-zwxs2_logging_fluentd-fcf59db7763f9a806cb8fc9d1f8e5498469eee1903b759a27fce6c5f8341f66f.log
2018-07-10 09:02:57 +0000 [info]: #0 following tail of /mnt/log/containers/fluentd-splunk-4lzmn_logging_fluentd-d31097f93d97c19e29bcbd60ec2ac7fa9c5c3210e7a82ddc1f0fc0f342cc6b3c.log
2018-07-10 09:02:57 +0000 [info]: #0 following tail of /mnt/log/containers/kube-router-n2qn4_kube-system_install-cni-ebc5aa41ee04f7cff1fe2ad6aa0faf77d697faab4a95417abffb805206b952c6.log
2018-07-10 09:02:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-10 09:02:57 +0000 [info]: #0 fluentd worker is now running worker=0
Cookie#domain returns dot-less domain name now. Use Cookie#dot_domain if you need "." at the beginning.
2018-07-10 09:03:22 +0000 [info]: #0 stats - namespace_cache_size: 3, pod_cache_size: 4, pod_cache_watch_misses: 6, pod_cache_watch_ignored: 1, pod_cache_watch_delete_ignored: 1, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5
2018-07-10 09:03:52 +0000 [info]: #0 stats - namespace_cache_size: 3, pod_cache_size: 4, pod_cache_watch_misses: 6, pod_cache_watch_ignored: 1, pod_cache_watch_delete_ignored: 1, namespace_cache_api_updates: 5, pod_cache_api_updates: 5, id_cache_miss: 5
...
...
2018-07-10 23:59:52 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 00:00:22 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 00:00:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-pfhjs-trace-2018-07-11.log
2018-07-11 00:00:52 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 00:01:22 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 00:01:52 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 00:02:22 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
...
...
2018-07-11 02:55:22 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 02:55:23 +0000 [info]: #0 detected rotation of /mnt/log/containers/api-web-pfhjs_api_api-72881f551a3d77f65622efb383bfe2c48fbfba19409411b4e6d237a975edae1d.log; waiting 5 seconds
2018-07-11 02:55:52 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 02:55:57 +0000 [info]: #0 following tail of /mnt/log/containers/api-web-2gs4l_api_api-4563c68b3a36f9d18e6c6604bc6275ea89ff7f829de7e7973b6d9c8ab1a41cb9.log
2018-07-11 02:55:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-11 02:56:22 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
2018-07-11 02:56:52 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
...
...
2018-07-11 02:59:52 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
2018-07-11 03:00:22 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
2018-07-11 03:00:52 +0000 [info]: #0 following tail of <path to our log file(s)>/api-web-2gs4l-trace-2018-07-11.log
2018-07-11 03:00:52 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
2018-07-11 03:01:22 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
...
...
2018-07-11 14:09:31 +0000 [info]: #0 detected rotation of /mnt/log/containers/api-web-2gs4l_api_api-4563c68b3a36f9d18e6c6604bc6275ea89ff7f829de7e7973b6d9c8ab1a41cb9.log; waiting 5 seconds
2018-07-11 14:09:52 +0000 [info]: #0 stats - namespace_cache_size: 1, pod_cache_size: 1, namespace_cache_api_updates: 1, pod_cache_api_updates: 1, id_cache_miss: 1
2018-07-11 14:09:57 +0000 [info]: #0 following tail of /mnt/log/containers/api-web-kdkmn_api_api-828663798bc503f391e9c6754af48d125fe551a6235d833b0bd60b64ca79371f.log
2018-07-11 14:09:57 +0000 [info]: #0 disable filter chain optimization because [Fluent::KubernetesMetadataFilter] uses `#filter_stream` method.
2018-07-11 14:10:22 +0000 [info]: #0 stats - namespace_cache_size: 2, pod_cache_size: 2, namespace_cache_api_updates: 2, pod_cache_api_updates: 2, id_cache_miss: 2
Let me know if you need any more details.
FYI wrong plugin name. Should be "kubernetes_sumologic".
2016-12-15 05:51:58 +0000 [info]: reading config file path="/fluentd/etc/fluent.conf"
2016-12-15 05:51:58 +0000 [info]: starting fluentd-0.12.31
2016-12-15 05:51:58 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.26.2'
2016-12-15 05:51:58 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.8.2'
2016-12-15 05:51:58 +0000 [info]: gem 'fluent-plugin-sumologic_output' version '0.0.3'
2016-12-15 05:51:58 +0000 [info]: gem 'fluentd' version '0.12.31'
2016-12-15 05:51:58 +0000 [info]: adding match pattern="containers.**.fluentd**" type="null"
2016-12-15 05:51:58 +0000 [info]: adding filter pattern="containers.**" type="kubernetes_metadata"
2016-12-15 05:51:59 +0000 [info]: adding filter pattern="containers.**" type="kubernetes_sumologic"
2016-12-15 05:51:59 +0000 [info]: adding filter pattern="docker.**" type="sumo_fields"
2016-12-15 05:51:59 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error="Unknown filter plugin 'sumo_fields'. Run 'gem search -rd fluent-plugin' to find plugins"
2016-12-15 05:51:59 +0000 [info]: process finished code=256
2016-12-15 05:51:59 +0000 [warn]: process died within 1 second. exit.
Hi,
Hope that you might be able to help me - bit stuck on this one. I've got a daemonset installed on my Kubernetes cluster to log container logs to SumoLogic, but I'm seeing some really odd behaviour with with logs getting flushed into Sumo. We have the following setup:
The problem that we are seeing is that logs are not getting pushed into Sumologic. We can't see logs appearing in Sumo in either "Live Tail" mode or in normal search. We see some initial logs from server startup but nothing while the JBoss server is running. When we kill the container ALL of the the logs suddenly get pushed up to Sumologic in one hit (individual log entries, not one giant log).
When no logs are being pushed we have observed the following:
As explained, while this is happening no log entries (other than initial logging) are being seen in SumoLogic itself, until the container gets shut down when all logs get flushed into Sumo.
Can you help please? Any input appreciated, even if it's just a list of other things I can check for our setup!
Thanks,
Alan.
The README says the default FlushInterval is 30s, however values.yaml lists the default as 5s
Identify correct default and make the documentation consistent
Can fluentd tail multiple large (and small) files in parallel? Reading this link seems like its only designed for 1 file at a time: fluent/fluentd#573. This seems to support the behavior of the fluentd logs.
We're trying to use with with Red Hat's OpenShift and the number of logs, along with the size, changes at any given time.
We have fluentd-kubernetes-sumologic to our nodes and are providing an additional source to pull in from a custom systemd unit. We have found that some of the logs from the service show up in Sumologic, but not all. Ive included the config below. How can we go about troubleshooting where the logs are being dropped off?
fluentd-kubernetes-sumologic release: v1.6
Kubernetes: 1.8.x
my.service: performs a docker run with no custom logging
Config:
<source>
@type systemd
path /mnt/log/journal
filters [{"_SYSTEMD_UNIT": "my.service"}]
<storage>
@type local
persistent true
path /mnt/pos/my.log.pos
</storage>
tag my
</source>
<filter my.**>
@type kubernetes_sumologic
source_category system/my
source_name my
source_category_prefix "#{ENV['SOURCE_CATEGORY_PREFIX']}"
exclude_host_regex "#{ENV['EXCLUDE_HOST_REGEX']}"
exclude_priority_regex "#{ENV['EXCLUDE_PRIORITY_REGEX']}"
exclude_unit_regex "#{ENV['EXCLUDE_UNIT_REGEX']}"
</filter>
Can you guys confirm why the docker image size is almost 10x the size of v1.0? Seems to be mostly because the base fluentd image is now using debian?
We are looking at updating our collector from v1.0 to v1.4 but don't see any major fixes or benefits (a changelog would be nice)
Howdy from down-under,
@lucaswilric and I are working on using this fluentd-kubernetes-sumologic container to forward our application logs to Sumologic. The problem we are having at the moment is the container is not authorised to talk to the Kubernetes API. Here is the log from one of the fluentd pods:
2017-08-03 01:49:19 +0000 [info]: reading config file path="/fluentd/etc/fluent.file.conf"
2017-08-03 01:49:19 +0000 [error]: config error file="/fluentd/etc/fluent.file.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API v1 endpoint [OUR_INTERNAL_DNS_CONTROLLER_NAME]:443/api: 401 Unauthorized"
We would like to request a feature to allow the container to use an SSL client certificate to talk to the Kubernetes API. Let me know if you need any other information.
Cheers,
With the latest upgrade to v1.8 from v1.6 we began to receive the following errors:
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `[]' for nil:NilClass" tag="containers.mnt.log.containers.calico-node-hxmb6_kube-system_calico-node-703da71b5a5aa7a72c4eef2653357809f11cb03bfb6fab74daa7e9db92b4acc3.log"
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluent-plugin-kubernetes_metadata_filter-1.0.1/lib/fluent/plugin/filter_kubernetes_metadata.rb:297:in `filter_stream_from_files'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:177:in `block in filter_stream'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:177:in `each'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:177:in `reduce'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:177:in `filter_stream'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:158:in `emit_events'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:96:in `emit_stream'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/out_relabel.rb:29:in `process'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/output.rb:721:in `emit_sync'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:159:in `emit_events'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/event_router.rb:96:in `emit_stream'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/in_tail.rb:348:in `receive_lines'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/in_tail.rb:465:in `wrap_receive_lines'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/in_tail.rb:666:in `block in on_notify'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/in_tail.rb:707:in `with_io'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/in_tail.rb:644:in `on_notify'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin/in_tail.rb:496:in `on_notify'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin_helper/timer.rb:77:in `on_timer'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/cool.io-1.5.0/lib/cool.io/loop.rb:88:in `run_once'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/cool.io-1.5.0/lib/cool.io/loop.rb:88:in `run'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin_helper/event_loop.rb:84:in `block in start'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 /var/lib/gems/2.3.0/gems/fluentd-0.14.17/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 emit transaction failed: error_class=NoMethodError error="undefined method `[]' for nil:NilClass" tag="containers.mnt.log.containers.calico-node-hxmb6_kube-system_calico-node-703da71b5a5aa7a72c4eef2653357809f11cb03bfb6fab74daa7e9db92b4acc3.log"
fluentd-2tl8n fluentd 2018-04-01 18:44:35 +0000 [warn]: #0 suppressed same stacktrace
Any clues what that might be?
Thank you.
I'd like some help to understand whether or not I've missed something when following the README and the guides on help.sumologic.com ... kubernetes.
I seem to have most dashboards working with the exception of scheduler related panels like Kubernetes - Overview -> Pods Scheduled By Namespace
which is driven by the following query:
_sourceCategory = *kube-scheduler*
| timeslice 1h
| parse "Successfully assigned * to *\"" as name2,node
| parse "reason: '*'" as reason
| parse "type: '*'" as normal
| parse "Name:\\\"*\\\"" as name
| parse "Namespace:\\\"*\\\"" as namespace
| parse "Kind:\\\"*\\\"" as kind
| count by _timeslice, namespace
| transpose row _timeslice column namespace
| fillmissing timeslice(1h)
The problem is that the line this query is driven by is not logged by the scheduler but emitted as an event. The only piece from the documentation which I can see which would be able to push this to sumo is the sumologic-k8s-api script which is noticeably lacking any calls the v1/api/events
as well as the role for calling that.
I've tested a fix which would add these log lines and can submit it as a PR against sumologic-k8s-api but I feel like I've missed something obvious.
I see some of the panels are driven by queries which extract fields which don't fill me with confidence that I've got things configured correctly:
Kubernetes - Controller Manager -> Event Severity Trend
using the following query:
_sourceCategory = *kube-controller-manager*
| parse "\"message\":\"*\"" as message
| parse "\"source\":\"*.*:*\"" as resource,resource_action,resource_code
| parse "\"severity\":\"*\"" as severity
| fields - resource_action, resource_code
| timeslice 1h
| count _timeslice, severity
| transpose row _timeslice column severity
| fillmissing timeslice(1h)
Which matches this log line:
{
"timestamp": 1528785188171,
"severity": "I",
"pid": "1",
"source": "round_trippers.go:439",
"message": "Response Status: 200 OK in 2 milliseconds"
}
Where resource_action, resource_code
would match go
and 439
respectively. Is this correct?
hey @frankreno
in docker hub (https://hub.docker.com/r/sumologic/fluentd-kubernetes-sumologic/ ) mentioned below for openshift
oc adm policy add-scc-to-user privileged system:serviceaccount:logging:sumologic-fluentd
oc adm policy add-cluster-role-to-user cluster-reader system:serviceaccount:logging:sumologic-fluentd
**service account is sumologic-fluentd and project is logging
but in actual fluentd.yml, service account pointing to fluentd
https://github.com/SumoLogic/fluentd-kubernetes-sumologic/blob/v1.18/daemonset/rbac/fluentd.yaml#L52
https://github.com/SumoLogic/fluentd-kubernetes-sumologic/blob/v1.18/daemonset/rbac/fluentd.yaml#L5
https://github.com/SumoLogic/fluentd-kubernetes-sumologic/blob/v1.18/daemonset/rbac/fluentd.yaml#L11
Seeing the following issue on K8S v1.7.1:
/mnt/log/containers/dle-authz-service-1349956859-rx1n5_pqa-authz_dle-authz-service-69d26cee5131254a5c7f4eaceba5c82631568b450dd9a835dfda72672d8e7d60.log unreadable. It is excluded and would be examined next time.
Hi.
I am using fluentd-kubernetes-sumologic on a few GKE clusters and recently tried the systemd setup.
Immediately after starting the new fluentd version with systemd, I ran into this issue/bug:
2017-07-28 09:42:26 +0000 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `empty?' for nil:NilClass" tag="kubelet" time=#<Fluent::EventTime:0x007fce2f02f800 @sec=1501234946, @nsec=499553000> record=
{"_BOOT_ID"=>"......
(above log line is trimmed)
Lines such as this repeat with high frequency (for each kubelet event).
By looking at the plugins/filter_kubernetes_sumologic.rb
source code, the problem is that all the exclude_* config_param's have nil as default, while there are checks such as @exclude_*.empty?
.
Using irb
to demo the problem:
irb(main):004:0* nil.empty?
NoMethodError: undefined method `empty?' for nil:NilClass
from (irb):4
from /usr/bin/irb:11:in `<main>'`
I have a commit with the fix and I will PR it.
I have a pull request open for the helm chart:
helm/charts#3453
It is adding support for automatically creating the persistence directory. However, I can't get the tests there to pass because it seems to absolutely require a valid Sumologic endpoint URL. How can I get the tests to without requiring a valid Sumologic endpoint URL for the helm chart?
Would love to see an example where containers on a k8s cluster are set to use the fluentd logging driver.
In fluentd.daemonset.yaml
the mount path is specified as /var/lib/docker
:
- name: docker-logs
mountPath: /var/lib/docker/
readOnly: true
But in source.docker.conf
, it is reading from /mnt/log/
.
<source>
type tail
format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
time_format %Y-%m-%dT%H:%M:%S.%NZ
path /mnt/log/docker.log
pos_file /mnt/pos/ggcp-docker.log.pos
tag docker
</source>
I'm seeing the same issue as #11
/mnt/log/containers/kube-apiserver-ip-10-131-51-160.ec2.internal_kube-system_kube-apiserver-c7a8163d649d8c56993670717b8aaa72704a7a9930aa115bd45971072f0ac96d.log unreadable. It is excluded and would be examined next time
I deployed the DaemonSet and Secret on my K8s cluster following the directions of the README file and it worked on my v1.5 cluster, but when I deployed it to my v1.6 cluster I got the above issue.
Differences between my v1.5 and v1.6 cluster set up:
Has anyone seen this issue on v1.6?
Support deployment
as a variable for sourceName and sourceCategory
The following variables are already supported: %{namespace}, %{pod}, %{container}. There are uses for being able to refer to the deployment in the sourceName or category.
I installed the plugin according to the README.
__sourceCategory
and _sourceName
are not populated in Sumologic (they are defaults). According to the README they should be populated (e.g., "%{namespace}.%{pod}.%{container}"
).
Any suggestion on how to debug this?
Per the README, the intention of the pod is to "run an instance of this container on each physical underlying host in the cluster". As of Kubernetes 1.6, the pods will only run on the worker nodes, due to the built in taint on masters.
I see two ways to solve this. Which is best depends on the maintainers intentions:
Happy to do a PR to fix this if desired.
Is it possible to disable shipping logs for particular pods by setting some param in annotation? My bet it's not yet possible, but any plans to add this feature? Since SumoLogic is typically paid by volume, would be nice to control what deployments/pods must ship their logs to the SumoLogic cloud.
Thank you.
I believe either the plugin registration name needs changed to kubernetes_sumologic_containers
or the file needs changed to filter_kubernetes_sumologic.rb
. I'm curious why others didn't experience this though so perhaps it's something environmental.
On one of my clusters, kube-apiserver
generates a lot of messages (more than 400 MB/day).
Unfortunately, it seems that I can not use EXCLUDE_*
to filter Kubernetes components collected here.
Maybe we can filter severity I
messages as suggested to me by @fikander, but a mechanism similar to the EXCLUDE
ones would be more general.
P.S. This is not related to: #13
I'm a little
_sourceCategory = *kube-scheduler*
| timeslice 1h
| parse "Successfully assigned * to *\"" as name2,node
| parse "reason: '*'" as reason
| parse "type: '*'" as normal
| parse "Name:\\\"*\\\"" as name
| parse "Namespace:\\\"*\\\"" as namespace
| parse "Kind:\\\"*\\\"" as kind
| count by _timeslice, namespace
| transpose row _timeslice column namespace
| fillmissing timeslice(1h)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.