Coder Social home page Coder Social logo

splunk / splunk-connect-for-kubernetes Goto Github PK

View Code? Open in Web Editor NEW
341.0 49.0 269.0 1.72 MB

Helm charts associated with kubernetes plug-ins

License: Apache License 2.0

Makefile 2.14% Ruby 1.34% Shell 7.74% Python 75.00% Mustache 13.78%
kubernetes chart splunk helm helm-chart

splunk-connect-for-kubernetes's Introduction

End of Support

Important: The Splunk Connect for Kubernetes will reach End of Support on January 1, 2024. After that date, this repository will no longer receive updates from Splunk and will no longer be supported by Splunk. Until then, only critical security fixes and bug fixes will be provided. Splunk recommends migrating to Splunk OpenTelemetry Collector for Kubernetes. Please refer to this migration guide for more details.

What does Splunk Connect for Kubernetes do?

Splunk Connect for Kubernetes provides a way to import and search your Kubernetes logging, object, and metrics data in your Splunk platform deployment. Splunk Connect for Kubernetes supports importing and searching your container logs on the following technologies:

Splunk Inc. is a proud contributor to the Cloud Native Computing Foundation (CNCF). Splunk Connect for Kubernetes utilizes and supports multiple CNCF components in the development of these tools to get data into Splunk.

Prerequisites

Before you begin

Splunk Connect for Kubernetes supports installation using Helm. Read the Prerequisites and Installation and Deployment documentation before you start your deployment of Splunk Connect for Kubernetes.

Perform the following steps before you install:

  1. Create a minimum of two Splunk platform indexes:
  • One events index, which will handle logs and objects (you may also create two separate indexes for logs and objects).
  • One metrics index. If you do not configure these indexes, Kubernetes Connect for Splunk uses the defaults created in your HTTP Event Collector (HEC) token.
  1. Create a HEC token if you do not already have one. If you are installing the connector on Splunk Cloud, file a ticket with Splunk Customer Service and they will deploy the indexes for your environment, and generate your HEC token.

Deploy with Helm

Helm, maintained by the CNCF, allows the Kubernetes administrator to install, upgrade, and manage the applications running in their Kubernetes clusters. For more information on how to use and configure Helm Charts, see the Helm site and repository for tutorials and product documentation. Helm is the only method that the Splunk software supports for installing Splunk Connect for Kubernetes.

To install and configure defaults with Helm:

  • Add Splunk chart repo
helm repo add splunk https://splunk.github.io/splunk-connect-for-kubernetes/
  • Get values file in your working directory

Helm 2

helm inspect values splunk/splunk-connect-for-kubernetes > values.yaml

Helm 3

helm show values splunk/splunk-connect-for-kubernetes > values.yaml
  • Prepare this Values file. Once you have a Values file, you can simply install the chart with by running

Helm 2

helm install --name my-splunk-connect -f values.yaml splunk/splunk-connect-for-kubernetes

Helm 3

helm install my-splunk-connect -f values.yaml splunk/splunk-connect-for-kubernetes

To learn more about using and modifying charts, see:

Configuration variables for Helm

To learn more about using and modifying charts, see:

Deploy using YAML (unsupported)

Only deploying by Helm is supported by Splunk.

You can grab the manifest YAML files and use them to create the Kubernetes objects needed to deploy Splunk Connect for Kubernetes. Please note that installation and debugging for Splunk Connect for Kubernetes through YAML is community-supported only.

When you use YAML to deploy Splunk Connect for Kubernetes, the installation does not create the default configuration that is created when you install using Helm. To deploy the connector using YAML, you must know how to configure your Kubernetes variables to work with the connector. If you are not familiar with this process, we recommend that you use the Helm installation method.

To configure the Splunk Connector for Kubernetes using YAML files:

  1. Grab the Charts and Manifest files from https://github.com/splunk/splunk-connect-for-kubernetes

  2. Read through all YAML files in the Manifests folder and make any necessary changes. Note that the YAML files in the Manifests folder are examples and are not expected to be used as provided.

  3. Verify that your Kubernetes logs are recognized by the Splunk Connect for Kubernetes.

Architecture

Splunk Connect for Kubernetes deploys a DaemonSet on each node. And in the DaemonSet, a Fluentd container runs and does the collecting job. Splunk Connector for Kubernetes collects three types of data:

To collect the data, Splunk leverages:

Logs

Splunk Connect for Kubernetes uses the Kubernetes node logging agent to collect logs. Splunk deploys a DaemonSet on each of these nodes. Each DaemonSet holds a Fluentd container to collect the data. The following plugins are enabled in that Fluentd container:

  • in_systemd reads logs from systemd journal if systemd is available on the host.
  • in_tail reads logs from file system.
  • filter_jq_transformer transforms the raw events to a Splunk-friendly format and generates source and sourcetypes.
  • out_splunk_hec sends the translated logs to your Splunk platform indexes through the HTTP Event Collector input (HEC).

Kubernetes Objects

Splunk Connect for Kubernetes collects Kubernetes objects that can help users access cluster status. Splunk deploys code in the Kubernetes cluster that collects the object data. That deployment contains one pod that runs Fluentd which contains the following plugins to help push data to Splunk:

  • in_kubernetes_objects collects object data by calling the Kubernetes API (by https://github.com/abonas/kubeclient). in-kubernetes-objects supports two modes:
    • watch mode: the Kubernetes API sends new changes to the plugin. In this mode, only the changed data is collected.
    • pull mode: the plugin queries the Kubernetes API periodically. In this mode, all data is collected.
  • filter_jq_transformer transforms the raw data into a Splunk-friendly format and generates sources and sourcetypes.
  • out_splunk_hec sends the data to Splunk via HTTP Event Collector input (HEC).

Metrics

Splunk Connect for Kubernetes deploys daemonsets on the Kubernetes cluster. These daemonsets have exactly one pod, which runs one container:

  • Fluentd metrics plugin collects the metrics, formats the metrics for Splunk ingestion by assuring the metrics have proper metric_name, dimensions, etc., and then sends the metrics to Splunk using out_splunk_hec using Fluentd engine.

Make sure your Splunk configuration has a metrics index that is able to receive the data. See Get started with metrics in the Splunk Enterprise documentation.

If you want to learn more about how metrics are monitored in a Kubernetes cluster, see Tools for Monitoring Compute, Storage, and Network Resources.

If you want to learn more about which metrics are collected and metric names used with Splunk Connect for Kubernetes, view the metrics schema.

Performance

Some parameters used with Splunk Connect for Kubernetes can have an impact on overall performance of log ingestion, objects, or metrics. In general, the more filters that are added to one of the streams, the greater the performance impact.

Splunk Connect for Kubernetes can exceed the default throughput of HEC. To best address capacity needs, Splunk recommends that you monitor the HEC throughput and back pressure on Splunk Connect for Kubernetes deployments and be prepared to add additional nodes as needed.

Processing multiline Logs

One possible filter option is to enable the processing of multiline events. This feature is currently experimental and considered to be community supported.

Configuring multiline fluentd filters to line break multiline logs

Configure apache tomcat multiline logs using the following steps:

  1. Develop a multiline filter with the proper regex and test the regex using a site such as https://rubular.com/
<filter tail.containers.var.log.containers.toolbox*toolbox*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}|^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}|^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s-\s-/
        multiline_end_regexp /\\n$/
        separator ""
        flush_interval 5s
</filter>
  1. Add the multiline filter to your deployment's logging configmap, using the customFilters parameter.

  2. Update separator config if required. "" is the default separator.

  3. Save your changes.

Managing SCK Log Ingestion by Using Annotations

Manage Splunk Connect for Kubernetes Logging with these supported annotations.

  • Use splunk.com/index annotation on pod and/or namespace to tell which Splunk platform indexes to ingest to. Pod annotation will take precedence over namespace annotation when both are annotated. ex) kubectl annotate namespace kube-system splunk.com/index=k8s_events
  • Set splunk.com/exclude annotation to true on pod and/or namespace to exclude its logs from ingested to your Splunk platform deployment.
  • Use splunk.com/sourcetype annotation on pod to overwrite sourcetype field. If not set, it is dynamically generated to be container:CONTAINER_NAME. Note that the sourcetype will be prefixed with .Values.sourcetypePrefix (default: kube:).

Regarding excluding container logs: If possible, it is more efficient to exclude it using fluentd.exclude_path option.

Searching for SCK metadata in Splunk

Splunk Connect for Kubernetes sends events to Splunk which can contain extra meta-data attached to each event. Metadata values such as "pod", "namespace", "container_name","container_id", "cluster_name" will appear as fields when viewing the event data inside Splunk. There are two solutions for running searches in Splunk on meta-data.

  • Modify search to usefieldname::value instead of fieldname=value.
  • Configure fields.conf on your downstream Splunk system to have your meta-data fields available to be searched using fieldname=value. Example: fields.conf.example

For more information on index time field extraction please view this guide.

Sending logs to ingest API

Splunk Connect for Kubernetes can be used to send events to Splunk Ingest API. In the ingest_api section of the yaml file you are using to deploy, the following configuration options have to be configured:

  • serviceClientIdentifier - Splunk Connect for Kubernetes uses the client identifier to make authorized requests to the ingest API.
  • serviceClientSecretKey - Splunk Connect for Kubernetes uses the client secret key to make authorized requests to the ingest API.
  • tokenEndpoint - This value indicates which endpoint Splunk Connect for Kubernetes should look to for the authorization token necessary for making requests to the ingest API.
  • ingestAPIHost - Indicates which url/hostname to use for requests to the ingest API.
  • tenant - Indicates which tenant Splunk Connect for Kubernetes should use for requests to the ingest API.
  • eventsEndpoint - Indicates which endpoint to use for requests to the ingest API.
  • debugIngestAPI - Set to True if you want to debug requests and responses to ingest API.

Maintenance And Support

Splunk Connect For Kubernetes is supported through Splunk Support assuming the customer has a current Splunk support entitlement (Splunk Support). For customers that do not have a current Splunk support entitlement, please search open and closed issues and create a new issue if not already there. The current maintainers of this project are the DataEdge team at Splunk.

License

See LICENSE.

splunk-connect-for-kubernetes's People

Contributors

abbas-splunk avatar bartekzm avatar bootswithdefer avatar chaitanyaphalak avatar davidrobertsorbis avatar dbaldwin-splunk avatar djschaap avatar foram-splunk avatar gp510 avatar hvaghani221 avatar jarleborsheim avatar jenworthington avatar kousik93 avatar luckyj5 avatar mehstg avatar mikeglauser avatar nhalstead avatar pszkamruk-splunk avatar retraut avatar rhockenbury avatar rockb1017 avatar rohlik avatar sigbjornt avatar stevegszabo avatar svrc avatar tonyswu avatar toscott avatar vihasmakwana avatar vinzent avatar vzabawski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

splunk-connect-for-kubernetes's Issues

Performance issues with Metrics and Metrics Aggregator - OOM and slow ingest

Issue:
We've identified an issue with the Metrics and Metrics Aggregator fluentd plugins not buffering chunks correctly with the settings provided inside helm charts.
https://github.com/splunk/splunk-connect-for-kubernetes/blob/master/helm-chart/splunk-kubernetes-metrics/values.yaml#L103
This is causing the plugins to get overwhelmed.

Cause
Further testing has determined that the value provided as the default configuration for
chunk_limit_records: 1000000
is incorrect. This is causing Metric and Metrics Aggregator events to not buffer at the expected interval.

Fix

Inside the values.yaml used to deploy Splunk Connect for Kubernetes please include the buffer and aggregator buffer lines with the updated chunk_limit_records value. This is only applicable for Splunk Connect for Kubernetes 1.1.0

splunk-kubernetes-metrics:
  kubernetes:
    ...
  splunk:
    ...
  image:
    ...
  buffer:
    chunk_limit_records: 10000
  aggregatorBuffer:
    chunk_limit_records: 10000

How to disable kube:container:redirector?

Hi,

I am using the logging component to connect K8 with Splunk and in general everything works fine.

I'm using Azure AKS and here are the generated sourcetypes:

kube:container:redirector 209,236 75.796%
kube:container:main 19,837 7.186%
fluentd:monitor-agent 18,504 6.703%
kube:container:tunnel-front 14,700 5.325%
kube:container:azureproxy 11,913 4.316%
kube:container:pinkcloud-server 593 0.215%
kube:container:tiller 557 0.202%
kube:container:heapster 331 0.12%
kube:container:splunk-fluentd-k8s-logs 309 0.112%
kube:container:heapster-nanny

The first one is generating tons of logs so my main question is how to disable the logging forwarder for a specific component?

Thanks,
Greg

Source typing container logs

Right now, source types are derived from the container or pod name. Are there any plans to provide the option for users to set the source type? This allows users to create the same containers with a common source type rather different ones. I've seen over 1000 sourcetypes created on a single cluster.

splunk-kubernetes-objects configmap.yaml format error

Behavior:

  • Define multiple configs for an object, e.g.
objects:
  core:
    v1:
      - name: pods
        namespace: default
        mode: pull
        interval: 60m

Expected

  • Objects are pulled/watched with the defined properties

Actual

  • Get config error

Wildcards needed to search for pod, namespace, container_name, and container_id

I have splunk-connect installed on my openshift cluster and logs showing up in Splunk. Events do have the fields pod, namespace, container_name, and container_id and value populating them. The issue is searching by any of these fields, searching namespace="openshift-node" will return no results, but namespace="*openshift-*node" with wildcards will return all the expected results. I'm not sure if this is something with Splunk itself or how splunk-connect is using jq to parse the filename in the fluentd config:

jq "def find_sourcetype(pod; container_name): container_name + \"/\" + pod | if startswith(\"dns-controller/dns-controller\") then \"kube:dns-controller\" elif startswith(\"sidecar/kube-dns\") then \"kube:kubedns-sidecar\" elif startswith(\"dnsmasq/kube-dns\") then \"kube:dnsmasq\" elif startswith(\"etcd-container/etcd-server\") then \"kube:etcd\" elif startswith(\"etcd-container/etcd-server-events\") then \"kube:etcd-events\" elif startswith(\"kube-apiserver/kube-apiserver\") then \"kube:kube-apiserver\" elif startswith(\"kube-controller-manager/kube-controller-manager\") then \"kube:kube-controller-manager\" elif startswith(\"autoscaler/kube-dns-autoscaler\") then \"kube:kube-dns-autoscaler\" elif startswith(\"kube-proxy/kube-proxy\") then \"kube:kube-proxy\" elif startswith(\"kube-scheduler/kube-scheduler\") then \"kube:kube-scheduler\" elif startswith(\"kubedns/kube-dns\") then \"kube:kubedns\" else empty end; def extract_container_info: (.source | ltrimstr(\"/var/log/containers/\") | split(\"_\")) as $parts | ($parts[-1] | split(\"-\")) as $cparts | .pod = $parts[0] | .namespace = $parts[1] | .container_name = ($cparts[:-1] | join(\"-\")) | .container_id = ($cparts[-1] | rtrimstr(\".log\")) | .; .record | extract_container_info | .sourcetype = (find_sourcetype(.pod; .container_name) // \"kube:container:\\(.container_name)\")"

"400 Bad Request Event field cannot be blank" when empty line is logged

I am running latest version of logging module of splunk connect for k8s (1.0.1).

Recently I have stumbled on a strange issue, whole batches of logs (fluentd tries to send logs in batches) were discarded by HEC with 400 response and message Bad Request Event field cannot be blank. Looks like all the processing done with jq transforms new line only log messages (\n) to empty events. HEC drops the whole batch even if the only one message it empty. In result, we were missing huge chunks of the logs. I know it is partially due to bad logging on our side. However, I think fluentd/Splunk should do a better job to protect the operator against such issues. As a workaround I have added:

      # ensure we do not have empty line logs, they cannot be ingested by Splunk and result in 400 response from
      # the Splunk HEC
      <filter tail.containers.**>
        @type jq_transformer
        jq 'if .record.log == "\n" then .record.log = "E" else .record.log = .record.log end | .record'
      </filter>

right before <filter tail.containers.**>. It does the job, we do not lose logs anymore, developers see they application is pushing garbage (empty logs are shown as E in Splunk).

Is there a better way of solving this?

add configuration option in splunk-kubernetes-objects values.yml to specify fields to index

Currently objects metadata are sent to Splunk in JSON format, and none of the fields are index-time fields, it imposes some difficulty to use commands like | tstats to get summary of data efficiently.
One workaround we are using is by setting INDEXED_EXTRACTION = true in props.conf, but that'll eat out disk space pretty quickly considering the cardinality of the fields to index.
Therefore, it would be great if we could specify a list of fields as index-time fields in values.yml file.

References:
https://github.com/splunk/splunk-connect-for-kubernetes/blob/master/manifests/splunk-kubernetes-logging/configMap.yaml#L217-L222
https://github.com/splunk/fluent-plugin-splunk-hec#fields-section-optional-single

Multiline events in ECS

Hello.
I do know that Splunk Connect for Kubernetes applies to EKS, yet this is first app that directly targets multiline issue for logs ingestion from containers and maybe you can help / give an advice how ti deal with the issue below.

We are actively using ECS, and have troubles with multiline logs ingestion via Splunk logging driver. After Matthew presentation "IT1502 - Splunking the DevOps Pipeline: A Buttercup Tale" at .conf18 i had hopes that this problem is being addressed already. But, alas, as far as i see this is true only for EKS for now.

Is Splunk doing anything to help with multiline logs ingestion in ECS? Do you have any suggestions/advices/examples how to make it work?

Thank you in advance,
-Alex

splunk-heapster cannot connect to splunk-fluentd-heapster

Hi,

I am getting error in metrics pod, looks like one container is unable to connect to other inside same pod.
I have installed connector from Helm chart, version 1.0.1.

splunk-heapser container logs:
I1122 11:53:58.124170 1 heapster.go:78] /heapster --source=kubernetes --sink=statsd:udp://127.0.0.1:9001
I1122 11:53:58.124207 1 heapster.go:79] Heapster version v1.5.1
I1122 11:53:58.124437 1 configs.go:61] Using Kubernetes client with master "https://10.96.0.1:443" and version v1
I1122 11:53:58.124449 1 configs.go:62] Using kubelet port 10255
I1122 11:53:58.140256 1 driver.go:104] statsd metrics sink using configuration : {host:127.0.0.1:9001 prefix: numMetricsPerMsg:5 protocolType:etsystatsd renameLabels:map[] allowedLabels:map[] customizeLabel:0x15f2560}
I1122 11:53:58.140281 1 driver.go:104] statsd metrics sink using configuration : {host:127.0.0.1:9001 prefix: numMetricsPerMsg:5 protocolType:etsystatsd renameLabels:map[] allowedLabels:map[] customizeLabel:0x15f2560}
I1122 11:53:58.140301 1 heapster.go:202] Starting with StatsD Sink
I1122 11:53:58.140305 1 heapster.go:202] Starting with Metric Sink
I1122 11:53:58.328875 1 heapster.go:112] Starting heapster on port 8082
E1122 11:54:06.526301 1 driver.go:159] statsd metrics sink - failed to send some metrics : write udp 127.0.0.1:38104->127.0.0.1:9001: write: connection refused

splunk-fluentd-heapster container logs, looks like it is listening:
2018-11-22 11:54:02 +0000 [info]: starting fluentd-1.2.0 pid=8 ruby="2.5.1"
2018-11-22 11:54:02 +0000 [info]: spawn command to main: cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--under-supervisor"]
2018-11-22 11:54:06 +0000 [info]: gem 'fluent-plugin-concat' version '2.2.2'
2018-11-22 11:54:06 +0000 [info]: gem 'fluent-plugin-jq' version '0.5.1'
2018-11-22 11:54:06 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.0.1'
2018-11-22 11:54:06 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.0.1'
2018-11-22 11:54:06 +0000 [info]: gem 'fluent-plugin-systemd' version '0.3.1'
2018-11-22 11:54:06 +0000 [info]: gem 'fluentd' version '1.2.0'
2018-11-22 11:54:06 +0000 [info]: adding match pattern="raw.metrics.udp" type="jq"
2018-11-22 11:54:06 +0000 [info]: adding match pattern="metrics.udp" type="splunk_hec"
2018-11-22 11:54:07 +0000 [info]: adding source type="udp"
2018-11-22 11:54:07 +0000 [info]: #0 starting fluentd worker pid=16 ppid=8 worker=0
2018-11-22 11:54:07 +0000 [info]: #0 listening udp socket bind="0.0.0.0" port=9001
2018-11-22 11:54:07 +0000 [info]: #0 fluentd worker is now running worker=0

Any idea what might be wrong?

Thanks.

Forced to set insecure_ssl to true in fluent.conf when communicating with Splunk Cloud

The only pod that gives us issues is splunk-kubernetes-objects which responds with

2018-07-20 23:59:59 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API v1 endpoint https://100.64.0.1:443/api: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate)"

To resolve this we are forced to set insecure_ssl to true in ConfigMap.

What causes the issue and how can we fix it?

multiline example request for Tomcat logs

Below is my exception outline

2018-09-24 11:21:00.428 WARN 1 --- [rter-1-thread-1] c.c.metrics.graphite.GraphiteReporter : Unable to report to Graphite

java.net.ConnectException: Connection timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.(Socket.java:434)
at java.net.Socket.(Socket.java:244)
at javax.net.DefaultSocketFactory.createSocket(SocketFactory.java:277)
at com.codahale.metrics.graphite.Graphite.connect(Graphite.java:128)
at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

What should be the multiline settings?

CR/LF in the logs --> header field value cannot include CR/LF

Version of the components

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:17:28Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

2018-09-25 08:20:34 +0000 [info]: starting fluentd-1.2.0 pid=5 ruby="2.5.1"
2018-09-25 08:20:35 +0000 [info]: gem 'fluent-plugin-concat' version '2.2.2'
2018-09-25 08:20:35 +0000 [info]: gem 'fluent-plugin-jq' version '0.5.1'
2018-09-25 08:20:35 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.0.1'
2018-09-25 08:20:35 +0000 [info]: gem 'fluent-plugin-splunk-hec' version '1.0.1'
2018-09-25 08:20:35 +0000 [info]: gem 'fluent-plugin-systemd' version '0.3.1'
2018-09-25 08:20:35 +0000 [info]: gem 'fluentd' version '1.2.0'

I'm getting CR/LF in the logs evrywhere which avoid pushing the logs to Splunk instance.

Error I'm getting

2018-09-25 08:20:41 +0000 [warn]: #0 bad chunk is moved to /tmp/fluentd/backup/worker0/object_3fca9345192c/576adc7bf8bbd67733d4d9a92e8bbde2.log
2018-09-25 08:20:46 +0000 [warn]: #0 got unrecoverable error in primary and no secondary error_class=ArgumentError error="header field value cannot include CR/LF"

Did you already deal with this error ?

Thanks.

How to only send logs from certain namespaces or pod

We would like to control what goes into Splunk. Some namespaces do not need to be in Splunk. Ideally we could also exclude some pods in a namespaces and not others.

How to achieve this please?
It's important so we control risk of data blow (license).

How do you enable SSL connections to the indexer?

I've looked at the helm docs, but I don't see any instructions on where to specify the certs to connect to the indexer after setting insecureSSL: false in the helm values.yml

2018-10-10 22:12:48 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2018-10-10 22:12:48 +0000 chunk="577e7273b2f87eac439ed0708b15ea3c" error_class=OpenSSL::SSL::SSLError error="SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate)"

How to get everything from journal?

System works fine with unit filters, but how to get everything from journal , e.g as plain journalctl would return?

tried with empty filters [], assuming that _TRANSPORT would always return something and thus not block jq transformer changing the record.source.
<source> @id journald-all @type systemd @label @SPLUNK tag journald.all:all path "/run/log/journal" filters [] read_from_head true <storage> @type local persistent true </storage> <entry> field_map {"MESSAGE": "log", "_TRANSPORT": "source"} field_map_strict true </entry> </source>

fluentd log looks perfectly fine, conf is read ok, but nothing from systemd comes out to Splunk ( all other source types do continue working) . If i switch back to systemd s with unit filters, starts working.

Any help is appreciated..

Thanks in advance,
Jan

Allow to use file buffer

Currently, the charts use memory as fluentd output buffer, and it can not be customized. We want to make the buffer section customizable, so that people can switch to other buffer system, e.g. file, if they want.

Plans for compatibility with Splunk Cloud?

We'd love to use this app in our Splunk instance, but Splunk support has informed us that this app is currently incompatible with Splunk Cloud. Are there any plans to add that compatibility?

Allow logging to multiple separate indexes

My organization needs the ability to send kubernetes logs to multiple separate indexes. This could be accomplished by using a custom values.yaml which specifies the index to be used for each splunk-kubernetes-logging.logs.<name>.index, and building a filter to set the index to which the fluent_plugin_splunk_hec will send the event.

Logging pod to reflush buffered files after restart

Is it possible for the logging pod to clear backlogged buffer files after a restart. This is a known problem with the EFK stack and possibly a constraint on fluentd. Sometimes, it’s not possible to avoid a pod restart, and we’ll end up with orphaned buffered files that we’d need to manually delete.

One option that we’ve considered is to configure a readiness probe to delete buffered files on restart.

missing argument chart name

Hi,
while installing with helm as per documentation I get an error
Error: This command needs 1 argument: chart name

even the attempt to add this repo fails with missing index.yaml

helm repo add splunk-connect-for-kubernetes https://github.com/splunk/splunk-connect-for-kubernetes/tree/master/helm-chart/splunk-connect-for-kubernetes

any hint for this?

How can I pass value from the Pod labels to the jq_filter?

Hi,
I need to be able to select an index per service. I am thinking to add a label with the index name in each pod.
Another option is to create a custom function in the splunk-kubernetes-logging.container_jq_filter with a lot of if for each service. The tradeoff of this is, I need to kill all the logging pods because they do not update the configuration if I do a helm update.

Any suggestion? Am I on the right way?
Thanks,
Cristian.

Feature: Reuse / Rename existing K8s secret

We need the option to reference an existing secret in K8s instead of creating a new one when the Splunk Helm Chart is deployed. In our K8s Instances all Secrets are created through a separate process / additional tooling.

Goal:

  • Can reference a exsiting secret
  • Able to change secret name to fit into our global Name Schema

Can't add log source from file.

Behavior:

  • In values.yaml, add some log source from file like
logs:
  audit-logs:
    from:
      file:
        path: /var/log/audit/*.log
  • Install the logging chart with that values.yaml

Expect:
Data in files match the pattern /var/log/audit/*.log should be ingested.

Actual:
Error raised:

2018-06-14 19:36:48 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error_class=Fluent::ConfigError error="Could not parse jq filter: .record.sourcetype = (.tag | ltrimstr(\"tail.file.\") | .record, error: Could not parse jq filter: .record.sourcetype = (.tag | ltrimstr(\"tail.file.\") | .record, error: jq: error: syntax error, unexpected $end (Unix shell quoting issues?) at <top-level>, line 1:\n.record.sourcetype = (.tag | ltrimstr(\"tail.file.\") | .record                                                      \njq: 1 compile error\n"

Provide examples/guidance on adding custom fields to the logging configmap

We should provide documentation/guidance for users on how to add their own custom fields to the logging configmap and document the jq_transformer

For example, we have used filters like this to add an index field based on the namespace:

 # new filter: set index
      <filter tail.containers.**>
        @type jq_transformer
        jq '.record | .index = "some_"+.namespace+"_string"'
      </filter>

# = output =
      <match **>
        @type splunk_hec
        protocol https
        hec_host host
        hec_port 8088
        hec_token "#{ENV['SPLUNK_HEC_TOKEN']}"
        host "#{ENV['SPLUNK_HEC_HOST']}"
        source_key source
        sourcetype_key sourcetype
        index_key index
        insecure_ssl true
        <fields>
          pod
          namespace
          container_name
          container_id
        </fields>


Metrics Pod Error in scraping containers from kubelet & 403 Forbidden Errors

With the deprecation of Heapster in recent versions of Kubernetes, I have see the Splunk Connect for Kubernetes Metrics pod throw errors about not being able to scrape the Kubelet. Here are two issues I have seen and the workarounds that got me around them:

splunk-kubernetes-metrics pod shows error scraping kubelet

kubectl -n splunk logs -f kubecon-demo-2018-splunk-kubernetes-metrics-6b65bfdc48-4wg99 -c splunk-heapster
E1220 21:36:05.021124       1 manager.go:101] Error in scraping containers from kubelet:10.0.14.200:10255: failed to get all container stats from Kubelet URL "http://10.0.14.200:10255/stats/container/": Post http://10.0.14.200:10255/stats/container/: dial tcp 10.0.14.200:10255: getsockopt: connection refused

Workaround:

Having seen this in the past with Openshift deployments, I edit the metrics deployment and update the heapster commands to point to the secure kubelet port 10250.

kubectl -n splunk edit deployment kubecon-demo-2018-splunk-kubernetes-metrics
spec:
      serviceAccountName: splunk-kubernetes-metrics
      containers:
      - image: k8s.gcr.io/heapster-amd64:v1.5.1
        imagePullPolicy: IfNotPresent
        name: splunk-heapster
        command:
        - "/heapster"
        #--source=kubernetes"
        - "--source=kubernetes:https://kubernetes.default.svc?useServiceAccount=true&kubeletHttps=true&kubeletPort=10250&insecure=true"
        - "--sink=statsd:udp://127.0.0.1:9001"

Once the update to the deployment rolls the pods, you will see that the connection refused error has turned to a 403 forbidden error.

After updating heapster command splunk-kubernetes-metrics pod shows 403 forbidden errors

kubectl -n splunk logs -f kubecon-demo-2018-splunk-kubernetes-metrics-697d7f6578-gjvp2 -c splunk-heapster
E1220 21:41:05.139272       1 manager.go:101] Error in scraping containers from kubelet:10.0.16.160:10255: failed to get all container stats from Kubelet URL "http://10.0.16.160:10255/stats/container/": Post http://10.0.16.160:10255/stats/container/: dial tcp 10.0.16.160:10255: getsockopt: connection refused

These appear to be caused by the fact we assume the system:heapster role to be available. Because it is absent in many deploys now, we will update the binding and give the splunk-kubernetes-metrics role cluster-admin.

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: splunk-kubernetes-metrics
  namespace: splunk
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: splunk-kubernetes-metrics
    namespace: splunk
kubectl -n splunk apply -f patch-splunk-kubernetes-metrics-roleBinding.yaml

Once applied you should see you are now receiving metrics.

Routing kubernetes-object data to different indexes based on namespace

For the kubernetes-logging data, I am routing data for each namespace to separate indexes based on the value for | .namespace = $parts[1] that is defined in extract index fields and sourcetype for container logs

I use an additional filter to set the index key for

<filter tail.containers.**>
@type jq_transformer
jq '.record | .index = "prefix_name_"+.namespace+"_cluster_name_suffix"

How can I use the same process to route the objects data via namespace to same index. The difference is that namespace is found in the metadata rather than a filename. For each objects data e.g. kubernetes events, the metadata field that references the namespace is different.

Do you have any examples for defining an extraction from metadata fields?

Missing template file _helpers.tpl

Behavior:

  • clone the repo
  • install the charts using the code from the cloned repo, e.g.
$ helm install --name my-release splunk-connect-for-kubernetes/helm-chart/splunk-connect-for-kubernetes

Expected:
Chart(s) will be installed.

Actual:
Found error:

Error: render error in "splunk-kubernetes-logging/templates/secret.yaml": template: splunk-kubernetes-logging/templates/secret.yaml:4:20: executing "splunk-kubernetes-logging/templates/secret.yaml" at <{{template "splunk-k...>: template "splunk-kubernetes-logging.fullname" not defined

Root Cause:
The _helpers.tpl files are missing in every chart.

Events in watch mode stop being detected

I am deploying the objects chart on a 1.11.2 K8S cluster with a watch on events only, and after roughly 45-60 min, new events are no longer detected and therefore not sent to Splunk. Bouncing the pod makes the events available again for the same 45-60 min, and the cycle repeats.

I had trace logging enabled and there is absolutely no notification or change in pattern to hint something is off. Searching around, it seems related to ManageIQ/kubeclient#273.

Can you confirm if you see this issue?

I can provide the full trace log if needed, but I'd rather do it via a new ticket in the Splunk Support Portal. Let me know if you'd like to have the log.

Thank you

Errors from splunk-fluentd-k8s-logs container

With query: index=eks_logs sourcetype="kube:container:splunk-fluentd-k8s-logs"

I am frequently seeing the following error in the logs:

[warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2018-12-03 17:42:26 +0000 chunk="57c21abb713ca798a76f45298b755dcb" error_class=SocketError error="Failed to open TCP connection to ..com:443 (getaddrinfo: Name or service not known)"

I'm using port 443 as my Splunk hosts are sitting behind an F5 vip (vip:443 -> splunk hosts:8088)

Everything else (logging, metrics, objects) seems to be working......so what do these errors mean? Can they be ignored?

Splunk 7.1.3

making multi line work? Dead lock recursive locking

I have some events on multi line

2019-03-18 06:06:48.859  INFO [manage-xxx-service,,,] [10.2.7.19] 1 --- [-15276-thread-1] o.a.k.clients.consumer.ConsumerConfig    : ConsumerConfig values:
        auto.commit.interval.ms = 5000
        auto.offset.reset = latest
        bootstrap.servers = [my-kafka-service:9092]
        check.crcs = true

So I added this line:

  manage-xxx-service:
    <<: *glog
    from:
      pod: my-xxx-service
    multiline:
      firstline: /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
    sourcetype: kube:my-xxx-service

Which produces this config:

      <filter tail.containers.var.log.containers.my-xxx-service*my-xxx-service*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
        flush_interval 5s
      </filter>

I'm getting an error "deadlock; recursive locking"

2019-03-18 05:13:54 +0000 [warn]: #0 dump an error event: error_class=ThreadError error="deadlock; recursive locking" location="/usr/local/bundle/gems/fluent-plugin-concat-2.3.0/lib/fluent/plugin/filter_concat.rb:144:in `synchronize'" tag="tail.containers.var.log.containers.my-xxx-service-85855985fc-pgl6g_yyy_my-incident-service-0ee1814dcd3596c96e0bf6c0a2e65a9437cf1b282a95daf41fbd6e8933df1f8f.log" time=

What am I doing wrong?

issue with splunk connect metrics pod

I have splunk-connect installed on my openshift cluster. The splunk-kubernetes-metrics pod is currently running fine but all I see is "metric" string in the splunk index search results.
Can someone tell me if I'm doing anything wrong ?

oc get pod
splunk-kubernetes-metrics-4-7hzd2 1/2 OOMKilled 1 15m

oc describe pod error

Normal Scheduled 16m default-scheduler Successfully assigned splunk-connect/splunk-kubernetes-metrics-4-7hzd2 to node009
Normal Pulled 16m kubelet, node009 Container image "splunk/fluentd-hec:1.0.1" already present on machine
Normal Created 16m kubelet, node009 Created container
Normal Started 16m kubelet, node009 Started container
Warning Unhealthy 7m kubelet, node009 Liveness probe failed: Get http://172.20.19.45:8082/healthz: read tcp 172.20.19.1:21952->172.20.19.45:8082: read: connection reset by peer
Warning BackOff 38s (x2 over 47s) kubelet, node009 Back-off restarting failed container
Normal Started 25s (x3 over 16m) kubelet, node009 Started container
Normal Pulled 25s (x3 over 16m) kubelet, node009 Container image "k8s.gcr.io/heapster-amd64:v1.5.1" already present on machine
Normal Created 25s (x3 over 16m) kubelet, node009 Created container

unable to install metrics chart

Trying to install metric sub chart. Unfortunately pod is failing with following message.

Error from server (BadRequest): a container name must be specified for pod test-metric-splunk-kubernetes-metrics-b4f946587-f4gxg, choose one of: [splunk-heapster splunk-fluentd-heapster]

Fluentd container is failing and following are the container logs.

2018-07-11 11:49:22 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2018-07-11 11:49:23 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error_class=Fluent::ConfigError error="Could not parse jq filter: def extract_labels:\r\n . as $labels | reduce range(length) as $n ({}; if $n % 2 == 0 then .["label." + $label
bels[$n + 1] else . end);\r\n\r\ndef extract_metric:\r\n if length % 2 == 0\r\n then (.[:-2] | extract_labels) + {metric: (.[-2] | gsub("/"; ".")), resource_id: .[-1]}\r\n else (.[:-1] | extract_labels) + {metric: (.[-1] | gsub("/"; "."))}\r\n end
ef extract_container:\r\n split(".") | {container_type: "pod", node: .[1], namespace: .[3], pod: .[5], container: .[7]} + (.[8:] | extract_metric) | .metric = "kube.container." + .metric | . ;\r\n \r\ndef extract_syscontainer:\r\n split(".") | {con
"sys", node: .[1], container: .[3]} + (.[4:] | extract_metric) | .metric = "kube.container." + .metric | . ;\r\n \r\ndef extract_pod:\r\n split(".") | {node: .[1], namespace: .[3], pod: .[5]} + (.[6:] | extract_metric) | .metric = "kube.pod." + .me
\n \r\ndef extract_namespace:\r\n split(".") | {namespace: .[1]} + (.[2:] | extract_metric) | .metric = "kube.namespace." + .metric | . ;\r\n \r\ndef extract_node:\r\n split(".") | {node: .[1]} + (.[2:] | extract_metric) | .metric = "kube.node." +
;\r\n \r\ndef extract_cluster:\r\n split(".") | .[1:] | extract_metric | .metric = "kube.cluster." + .metric | . ;\r\n\r\ndef extract:\r\n if contains(".container.")\r\n then extract_container\r\n elif contains(".sys-container.")\r\n then extra
ner\r\n elif contains(".pod.")\r\n then extract_pod\r\n elif startswith("namespace.")\r\n then extract_namespace\r\n elif startswith("node.")\r\n then extract_node\r\n elif startswith("cluster.")\r\n then extract_cluster\r\n else {}\r\n end;
eapster/namespace:\(env.MY_NAMESPACE)/pod:\(env.MY_POD_NAME)" as $source | .record | to_entries | map({value, source: $source} + (.key | extract)) | ., error: Could not parse jq filter: def extract_labels:\r\n . as $labels | reduce range(length) as $n ({}
== 0 then .["label." + $labels[$n]] = $labels[$n + 1] else . end);\r\n\r\ndef extract_metric:\r\n if length % 2 == 0\r\n then (.[:-2] | extract_labels) + {metric: (.[-2] | gsub("/"; ".")), resource_id: .[-1]}\r\n else (.[:-1] | extract_labels) + {met
| gsub("/"; "."))}\r\n end;\r\n \r\ndef extract_container:\r\n split(".") | {container_type: "pod", node: .[1], namespace: .[3], pod: .[5], container: .[7]} + (.[8:] | extract_metric) | .metric = "kube.container." + .metric | . ;\r\n \r\ndef ext
ainer:\r\n split(".") | {container_type: "sys", node: .[1], container: .[3]} + (.[4:] | extract_metric) | .metric = "kube.container." + .metric | . ;\r\n \r\ndef extract_pod:\r\n split(".") | {node: .[1], namespace: .[3], pod: .[5]} + (.[6:] | extr
| .metric = "kube.pod." + .metric | . ;\r\n \r\ndef extract_namespace:\r\n split(".") | {namespace: .[1]} + (.[2:] | extract_metric) | .metric = "kube.namespace." + .metric | . ;\r\n \r\ndef extract_node:\r\n split(".") | {node: .[1]} + (.[2:] | e
c) | .metric = "kube.node." + .metric | . ;\r\n \r\ndef extract_cluster:\r\n split(".") | .[1:] | extract_metric | .metric = "kube.cluster." + .metric | . ;\r\n\r\ndef extract:\r\n if contains(".container.")\r\n then extract_container\r\n elif co
s-container.")\r\n then extract_syscontainer\r\n elif contains(".pod.")\r\n then extract_pod\r\n elif startswith("namespace.")\r\n then extract_namespace\r\n elif startswith("node.")\r\n then extract_node\r\n elif startswith("cluster.")\r\n
_cluster\r\n else {}\r\n end;\r\n\r\n "heapster/namespace:\(env.MY_NAMESPACE)/pod:\(env.MY_POD_NAME)" as $source | .record | to_entries | map({value, source: $source} + (.key | extract)) | ., error: jq: error: syntax error, unexpected INVALID_CHARACTER
quoting issues?) at , line 1:\ndef extract_labels:\r \njq: 1 compile error\n"
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluent-plugin-jq-0.5.1/lib/fluent/plugin/jq_mixin.rb:17:in rescue in configure' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluent-plugin-jq-0.5.1/lib/fluent/plugin/jq_mixin.rb:12:in configure'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/plugin.rb:164:in configure' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/agent.rb:130:in add_match'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/agent.rb:72:in block in configure' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/agent.rb:64:in each'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/agent.rb:64:in configure' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/root_agent.rb:112:in configure'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/engine.rb:131:in configure' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/engine.rb:96:in run_configure'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/supervisor.rb:795:in run_configure' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/supervisor.rb:579:in dry_run'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/supervisor.rb:597:in supervise' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/supervisor.rb:502:in run_supervisor'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/lib/fluent/command/fluentd.rb:310:in <top (required)>' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in require'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in require' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/gems/fluentd-1.2.0/bin/fluentd:8:in <top (required)>'
2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/bin/fluentd:23:in load' 2018-07-11 11:49:23 +0000 [debug]: /usr/local/bundle/bin/fluentd:23:in

'

kubectl logs test-metric-splunk-kubernetes-metrics-b4f946587-znqnm -c splunk-heapster
I0711 11:48:54.169991 1 heapster.go:78] /heapster --source=kubernetes --sink=statsd:udp://127.0.0.1:9001
I0711 11:48:54.170153 1 heapster.go:79] Heapster version v1.5.1
I0711 11:48:54.170319 1 configs.go:61] Using Kubernetes client with master "https://10.0.0.1:443" and version v1
I0711 11:48:54.170333 1 configs.go:62] Using kubelet port 10255
I0711 11:48:54.390631 1 driver.go:104] statsd metrics sink using configuration : {host:127.0.0.1:9001 prefix: numMetricsPerMsg:5 protocolType:etsystatsd renameLabels:map[] allowedLabels:map[] customizeLabel:0x15f2560}
I0711 11:48:54.390682 1 driver.go:104] statsd metrics sink using configuration : {host:127.0.0.1:9001 prefix: numMetricsPerMsg:5 protocolType:etsystatsd renameLabels:map[] allowedLabels:map[] customizeLabel:0x15f2560}
I0711 11:48:54.390711 1 heapster.go:202] Starting with StatsD Sink
I0711 11:48:54.390714 1 heapster.go:202] Starting with Metric Sink
I0711 11:48:54.569104 1 heapster.go:112] Starting heapster on port 8082
E0711 11:49:06.370532 1 driver.go:159] statsd metrics sink - failed to send some metrics : write udp 127.0.0.1:41485->127.0.0.1:9001: write: connection refused

==================================================

configmap.yaml file

apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "=" . }}
labels:
app: {{ template "splunk-kubernetes-metrics.name" . }}
chart: {{ template "splunk-kubernetes-metrics.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
data:
fluent.conf: |
# system wide configurations

log_level {{ or .Values.logLevel .Values.global.logLevel | default "info" }}

<source>
  @type udp
  tag raw.metrics.udp
  port 9001
  message_length_limit 100m
  <parse>
    @type jq
    jq 'split("\n") | reduce .[] as $item ({}; ($item | rindex(":")) as $i | .[$item[:$i]] = ($item[$i+1:-2] | tonumber))'
  </parse>
</source>

<match raw.metrics.udp>
  @type jq
  jq {{ include "splunk-kubernetes-metrics.jq_filter" . | quote }}
  remove_tag_prefix raw
</match>

<match metrics.udp>
  @type splunk_hec
  data_type metric
  metric_name_key metric
  metric_value_key value
  protocol {{ or .Values.splunk.hec.protocol .Values.global.splunk.hec.protocol }}
  {{- with or .Values.splunk.hec.host .Values.global.splunk.hec.host }}
  hec_host {{ . }}
  {{- end }}
  {{- with or .Values.splunk.hec.port .Values.global.splunk.hec.port }}
  hec_port {{ . }}
  {{- end }}
  hec_token "#{ENV['SPLUNK_HEC_TOKEN']}"
  host "#{ENV['SPLUNK_HEC_HOST']}"
  {{- with or .Values.splunk.hec.indexName .Values.global.splunk.hec.indexName }}
  index {{ . }}
  {{- end }}
  source ${tag}
  insecure_ssl {{ or .Values.splunk.hec.insecureSSL .Values.global.splunk.hec.insecureSSL | default false }}
  {{- if or .Values.splunk.hec.clientCert .Values.global.splunk.hec.clientCert }}
  client_cert /fluentd/etc/splunk/hec_client_cert
  {{- end }}
  {{- if  or .Values.splunk.hec.clientKey .Values.global.splunk.hec.clientKey }}
  client_key /fluentd/etc/splunk/hec_client_key
  {{- end }}
  {{- if or .Values.splunk.hec.caFile .Values.global.splunk.hec.caFile }}
  ca_file /fluentd/etc/splunk/hec_ca_file
  {{- end }}
  <buffer>
    @type memory
    {{- $limit := .Values.resources.sidecar.limit }}
    chunk_limit_size {{ if $limit.memory }}{{ template "splunk-kubernetes-logging.convert-memory" $limit.memory }}{{ else }}{{ "500m" }}{{ end }}
    chunk_limit_records 100000
    flush_interval 5s
    flush_thread_count 1
    overflow_action block
    retry_max_times 3
  </buffer>
</match>

How to include a custom log file within a pod?

Hi,

I imagine this is relatively simple for experienced fluentd users.

Most of the logs picked up by the splunk-connect-for-kubernetes sidecar work out of the box.

I've added an additional log file for java's garbage collector process... let's say it's a simple /gc.log

How should I go about adding it to splunk?

Thanks!
Greg

Splunk Connect pods consuming too much memory

We are seeing our pods are consuming a lot of memory over time.

NAME CPU(cores) MEMORY(bytes)
splunk-dev-eks-splunk-kubernetes-logging-hkdkh 5m 206Mi
splunk-dev-eks-splunk-kubernetes-logging-hnzj2 7m 497Mi
splunk-dev-eks-splunk-kubernetes-logging-jf766 23m 350Mi
splunk-dev-eks-splunk-kubernetes-logging-k9fp5 9m 238Mi
splunk-dev-eks-splunk-kubernetes-logging-qr4dg 27m 827Mi
splunk-dev-eks-splunk-kubernetes-logging-rl4ct 73m 489Mi
splunk-dev-eks-splunk-kubernetes-metrics-6c469697f8-sr926 0m 76Mi
splunk-dev-eks-splunk-kubernetes-objects-bbccdd967-tp26s 30m 329Mi

For example splunk-dev-eks-splunk-kubernetes-logging-qr4dg is already at 828MB and eventually goes up to at least 2GB.

API Server SSL_connect Error

I am having trouble with my pods connecting with my API server. All of them are failing to connect, but to focus on one one of them, I'm getting a "certificate verify failed" error I've pasted below when I spin up the splunk-fluentd-k8s-objects pod.

2018-08-09 12:51:07 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
2018-08-09 12:51:08 +0000 [warn]: both of Plugin @id and path for &lt;storage&gt; are not specified. Using on-memory store.
2018-08-09 12:51:08 +0000 [error]: config error file="/fluentd/etc/fluent.conf" error_class=Fluent::ConfigError error="Invalid Kubernetes API apps/v1 endpoint https://172.20.0.1:443/apis: SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate)"

I'm pretty sure this isn't an RBAC issue, but here is the access the ClusterRole has created from the helm chart.

C02T797WGTFM:logging meadcx$ kubectl describe clusterroles splunk-connector-splunk-kubernetes-objects
Name:         splunk-connector-splunk-kubernetes-objects
Labels:       app=splunk-kubernetes-objects
              chart=splunk-kubernetes-objects-1.0.1
              heritage=Tiller
              release=splunk-connector
Annotations:  <none>
PolicyRule:
  Resources        Non-Resource URLs  Resource Names  Verbs
  ---------        -----------------  --------------  -----
  events           []                 []              [watch]
  namespaces       []                 []              [get list]
  nodes            []                 []              [get list]
  pods             []                 []              [get list]
  events.apps      []                 []              [watch]
  namespaces.apps  []                 []              [get list]
  nodes.apps       []                 []              [get list]
  pods.apps        []                 []              [get list]

Since I think the ClusterRole looks fine, I assume I need a way to disable the certificate validation when the pods connect to the API server. Is it possible for me to do and I've just overlooked the place to do that? Or am I thinking about this wrong and perhaps it's another issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.