Coder Social home page Coder Social logo

helm-charts's Introduction

Datadog Helm Charts

Artifact HUB

Official Helm charts for Datadog products. Currently supported:

How to use Datadog Helm repository

You need to add this repository to your Helm repositories:

helm repo add datadog https://helm.datadoghq.com
helm repo update

helm-charts's People

Contributors

001wwang avatar ahmed-mez avatar alidatadog avatar bmbferreira avatar brycekahle avatar celenechang avatar clamoriniere avatar cyberox avatar davidor avatar dependabot[bot] avatar fanny-jiang avatar hkaj avatar isauve avatar jacksondavenport avatar jennchenn avatar kaderinho avatar khewonc avatar l3n41c avatar lebauce avatar leeavital avatar lefebvree avatar levan-m avatar liliyadd avatar paulcacheux avatar ruizb avatar safchain avatar spencergilbert avatar vboulineau avatar xornivore avatar yafernandes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helm-charts's Issues

Service "staging-datadog-kube-state-metrics" is invalid: spec.clusterIP: Invalid value: "": field is immutable

Describe what happened:
Upgrading the chart causes a change on the cluster ip

Error: UPGRADE FAILED: failed to replace object: Service "staging-datadog-kube-state-metrics" is invalid: spec.clusterIP: Invalid value: "": field is immutable

Describe what you expected:
The release should've just upgraded successfully

Steps to reproduce the issue:
Deploy then upgrade the chart

Additional environment details (Operating System, Cloud provider, etc):
HELM 3.4.1

version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"clean", GoVersion:"go1.14.11"}

Chart version 2.4.2

Kubernetes

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.4", GitCommit:"224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba", GitTreeState:"clean", BuildDate:"2019-12-11T12:47:40Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.13-gke.401", GitCommit:"eb94c181eea5290e9da1238db02cfef263542f5f", GitTreeState:"clean", BuildDate:"2020-09-09T00:57:35Z", GoVersion:"go1.13.9b4", Compiler:"gc", Platform:"linux/amd64"}

Not sure of how to use apiKeyExistingSecret

it looks like I can create a k8s secret myself and then set apiKeyExistingSecret to the name of that k8s secret for the api key right?

Do I have to create the k8s secret in a specific namespace for the datadog helm release to see it?

Updating to 2.4.5 Windows chart causes invalid mount type error

Describe what happened:
I upgrade my targetSystem: "windows" chart from 2.3.18 to 2.4.5 and the rollout failed with an error:

Error: Error response from daemon: invalid volume specification: 'c:\var\lib\kubelet\pods\7a36de82-4820-44f6-ae71-fddeb172118c\volumes\kubernetes.io~configmap\installinfo\install_info:C:/ProgramData/Datadog/install_info:ro': invalid mount config for type "bind": source path must be a directory

Describe what you expected:
The rollout of the chart to succeed

Steps to reproduce the issue:

Additional environment details (Operating System, Cloud provider, etc):
AWS EKS, Kubernetes 1.17

Missing configuration for DD_CHECKS_TAG_CARDINALITY env variable

Describe what happened:

While attempting to get the pod_name tag added to pods, I realised that setting the DD_CHECKS_TAG_CARDINALITY env variable to a value of orchestrator is what allows this. There isn't currently a helm chart option to configure this value.

Describe what you expected:

A helm chart option to configure this value.

Steps to reproduce the issue:

Not really relevant, other than the fact that I tried using the datadog.dogstatsd.tagCardinality flag to enable this, but it configures something else.

Additional environment details (Operating System, Cloud provider, etc):

Latest helm chart version

Unified service tagging is not possible.

Describe what happened:

Cannot add labels for unified service tagging to metadata of DaemonSet.
It's also inconvenient because I have to redeploy the pod when matchLabels change.

Describe what you expected:

A variable should be added to values.yaml to add labels for unified service tagging to both metadata and template/metadata only.

https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/?tab=kubernetes

Steps to reproduce the issue:

Run the helm template command using this values.yaml.

datadog:
  agent:
    podLabels:
      tags.datadoghq.com/env: "<ENV>"
      tags.datadoghq.com/service: "<SERVICE>"
      tags.datadoghq.com/version: "<VERSION>"

output:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: datadog
  labels:
    helm.sh/chart: "datadog-2.4.14"
    app.kubernetes.io/name: "datadog"
    app.kubernetes.io/instance: "datadog"
    app.kubernetes.io/managed-by: "Helm"
    app.kubernetes.io/version: "7"
spec:
  selector:
    matchLabels:
      app: datadog
      tags.datadoghq.com/env: <ENV>
      tags.datadoghq.com/service: <SERVICE>
      tags.datadoghq.com/version: <VERSION>
  template:
    metadata:
      labels:
        app: datadog
        tags.datadoghq.com/env: <ENV>
        tags.datadoghq.com/service: <SERVICE>
        tags.datadoghq.com/version: <VERSION>
      name: datadog

Additional environment details (Operating System, Cloud provider, etc):

Chart version 2.4.14

Add support for configuring DD_DOGSTATSD_TAGS

Describe what happened:

In order to add tags to custom DogStatsD metrics, users currently have to manually configure the DD_DOGSTATSD_TAGS environment variable.

This currently requires users to pass the following configuration:

agents:
  containers:
    agent:
      env:
        - name: DD_DOGSTATSD_TAGS
          value: '["env:foo","service:bar","version:0.0.1","team:quux"]'

Describe what you expected:

It would be fantastic to be able to configure DD_DOGSTATSD_TAGS in a similar way to DD_TAGS. In particular I would have expected the following to work:

datadog:
  dogstatsd:
    tags:
      - "env:foo"
      - "service:bar"
      - "version:0.0.1"
      - "team:quux"

Steps to reproduce the issue:

Configure the DD_DOGSTATSD_TAGS environment variable.

Inconsistent with systemProbe.enabled deprecation

The message is confusing given that systemProbe.enabled was deprecated.

{{- if (and .Values.datadog.securityAgent.runtime.enabled (not .Values.datadog.systemProbe.enabled)) }}
##############################################################################
#### NOTE: Configuration notice. ####
##############################################################################
You enabled runtime security features in the SecurityAgent but the System Probe is not explicitly enabled.
The System Probe will be enabled as required by these features. Please make sure this is expected for your configuration.
{{- end }}
{{- else }}

Missing s6-run volumeMount in process-agent container

Describe what happened:
Network Performance Monitoring was unable to report to datadog because of a missing volumeMount configuration (s6-run) in the process-agent container.

Describe what you expected:
I expect that the process-agent container will have all required configuration needed to facilitate Network Performance Monitoring

Steps to reproduce the issue:
Install the helm chart and enable the system-probe container

Additional environment details (Operating System, Cloud provider, etc):
We're running k8s-1.15-debian-stretch-amd64-hvm-ebs-2019-09-26 (ami-06e67726ce5e65ca7).
Here is the output of uname -a on one of our hosts in the cluster:

Linux ip-192-168-2-159 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64 GNU/Linux

DataDog chart version 2.4.34 is broken: illegal number syntax: "-"

Describe what happened:
When trying to install DataDog chart (version 2.4.34), the installation fails with the following error:

Error: UPGRADE FAILED: parse error in "datadog/templates/_helpers.tpl": template: datadog/templates/_helpers.tpl:194: illegal number syntax: "-"

Installation of the previous version (2.4.33) on the exact same infrastructure finished as expected.
It looks related to this PR.

Describe what you expected:

Successful installation of DataDog.

Steps to reproduce the issue:
helm upgrade --install datadog/datadog --version 2.4.34 datadog

Additional environment details (Operating System, Cloud provider, etc):
kubernetes: 1.16
helm: 2.14

Add DD agent container securityContext field

Original issue by @billyshambrook (helm/charts#16353) with fix PR by @coreypobrien (helm/charts#17274 - stale)

The new Network Performance feature requires setting some container securityContext.capabilities - they are shown here under the "kubernetes" tab https://docs.datadoghq.com/graphing/infrastructure/network_performance_monitor/?tab=kubernetes

Currently only pod securityContext is exposed through values, not container securityContext.

Could we expose container securityContext as an optional value?

Cheers!

Multiple hostPort option

Running the most up-to-date helm chart I'm trying to get a Jenkins EKS pod to be able to communicate with the daemonset running on the same node. I can get the logic to work where I open the hostPort for dogstatsD and that works for sending metrics. But for the logging section it has to get shipped to a custom integration for Jenkins that accepts a TCP log. I'm not sure I can open multiple hostPorts to the daemonset using the current helm chart. Basically I need to be able to open both 8125 (for the dogstatsD) and 8126 for the jenkins log receiver.

datadog:
  apiKeyExistingSecret: datadog-api
  log_format_json: true
  logLevel: warn

  confd:
    jenkins.yaml: |-
      logs:
        - type: tcp
          port: 8126
          service: ops
          source: jenkins
          tags:
            - jenkins_job

  leaderElection: true
  collectEvents: true

  processAgent:
    enabled: true
    processCollection: true

  logs:
    enabled: true
    containerCollectAll: true
    containerCollectUsingFiles: true

  networkMonitoring:
    enabled: true

  dogstatsd:
    originDetection: true
    useHostPort: true
    useHostPID: true
    nonLocalTraffic: true

unknown field "+allowHostPorts"

Describe what happened:
Cannot apply the helm chart on our OpenShift cluster, when SCC is enabled:

  Error: Failed to render chart: exit status 1: Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(SecurityContextConstraints): unknown field "+allowHostPorts" in com.github.openshift.api.security.v1.SecurityContextConstraints, ValidationError(SecurityContextConstraints.seLinuxContext): unknown field "rule" in com.github.openshift.api.security.v1.SELinuxContextStrategyOptions, ValidationError(SecurityContextConstraints): missing required field "allowHostPorts" in com.github.openshift.api.security.v1.SecurityContextConstraints]
  Error: plugin "diff" exited with error

There is a typo in the allowHostPorts field.

Steps to reproduce the issue:
apply the helm chart on OpenShift

Additional environment details (Operating System, Cloud provider, etc):
OpenShift 4.4 + Helm 3

Question: AD and missing tags from Unified Service Tagging

Describe what happened:

I am having some troubles getting metrics from a pod to be tagged with service: (as defined in the kubernetes pod labels). I followed the procedure for Unified Service Tagging, however had to improvise due to some complications with our setup:

  • env: is set as a global tag on the agent configuration (in the helm chart values)
  • service: is set as label on the kubernetes Deployment and Pod
  • version: is set programmatically in our application (we are unable to get this value from our Deployment)

The metrics emitted from the pod do not show the service: tag. However, both env: and version: are present. Weirdly, the logs from the pod contain the service: tag, as well as APM.

This is an extract from the Deployment manifest - as explained above only the service label is applied here. Can it be the problem?

metadata:
  labels:
    tags.datadoghq.com/service: bit-api
spec:
  template:
    metadata:
      labels:
        tags.datadoghq.com/service: bit-api
  spec:
    containers:
      - env:
         - name: DD_SERVICE
           valueFrom:
             fieldRef:
               fieldPath: metadata.labels['tags.datadoghq.com/service']

The label is properly applied to the Pod (screenshot from GCP)
image

Describe what you expected:

Metrics sent from the relevant pods should have their service: tag available.

I have checked this is not the case using Datadog Metrics Summary on several unique metrics exported by our application. However, logs sent from the Pod are properly tagged with service:

Additional environment details (Operating System, Cloud provider, etc):

GKE. Kubernetes 1.17.14.
Datadog agent + cluster agents installed using this Helm chart.

The metrics are emitted by https://github.com/brightcove/hot-shots using Dogstatd Unix Socker (https://docs.datadoghq.com/developers/dogstatsd/unix_socket?tab=kubernetes).


Thanks

PSP doesn't include required apparmor profile by default

Describe what happened:

When deploying the chart with the agent PSP enabled, the PSP doesn't allow for the unconfined apparmor profile used by default by the system probe container

Describe what you expected:

This PSP should allow the unconfined apparmor profile

Steps to reproduce the issue:

  1. Deploy the Helm chart with:
values:
  agents:
    enabled: true
    podSecurity:
      podSecurityPolicy:
        create: true

Additional environment details (Operating System, Cloud provider, etc):

Chart version 2.4.14

unknown field "rule" in io.k8s.api.core.v1.PodSecurityContext

Describe what happened:
When using 2.4.10 to install with helm 3, it cannot install due to a validation error

error validating data: ValidationError(DaemonSet.spec.template.spec.securityContext): unknown field "rule" in io.k8s.api.core.v1.PodSecurityContext

It looks like this issue was originally fixed in stable/datadog a few releases back: helm/charts#23146
but then got reverted in helm/charts@ae5f7d1

Describe what you expected:
Should install normally

Steps to reproduce the issue:
helm upgrade -i datadog datadog/datadog --version 2.4.10 -f values.yaml --namespace datadog

Additional environment details (Operating System, Cloud provider, etc):
kubernetes 1.15
helm version.BuildInfo{Version:"v3.3.0", GitCommit:"8a4aeec08d67a7b84472007529e8097ec3742105", GitTreeState:"dirty", GoVersion:"go1.14.7"}

Chart conflicts with existing v1beta1.external.metrics.k8s.io resource

Describe what happened:

helm install  datadog-monitoring \                                                                                                                                                                   
    --set datadog.apiKey=MY_KEY \
    --set datadog.appKey=MY_KEY\
    --set clusterAgent.enabled=true \
    --set clusterAgent.metricsProvider.enabled=true \
    datadog/datadog

Error: rendered manifests contain a resource that already exists. Unable to continue with install: APIService "v1beta1.external.metrics.k8s.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "datadog-monitoring"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "datadog"

Describe what you expected:
Chart would have been installed properly

Steps to reproduce the issue:
Run the commands from the README on a cluster that already have some external metrics such as https://github.com/awslabs/k8s-cloudwatch-adapter

Additional environment details (Operating System, Cloud provider, etc):
EKS, with K8s 1.17

podLabelsAsTags support for custom metrics

Describe what happened:
I've configured podLabelsAsTags value when installing the datadog helm chart
https://github.com/DataDog/helm-charts/blob/master/charts/datadog/values.yaml#L87

I can see the pod labels appended as tags on the standard metrics like kubernetes.memory.request, kubernetes.pods.running, etc

I have an app that pushes stats using a custom metric: envoy.http.downstream_rq_xx via UDP to the hostPort of the datadog agent pods (deployed as DaemonSet) but there are no pod labels as tags on this metrics when I check on my dashboard.

Describe what you expected:
I want to know if podLabelsAsTags supports custom metrics and if there are any special steps I need to achieve this.

Additional environment details (Operating System, Cloud provider, etc):
Cloud Provider: AWS
Platform: AWS EKS
Kubernetes version: v1.18.9
Host OS: RHEL

Helm chart doesn't install the agent anymore?

I've been using datadog to monitor my kubernetes cluster. When I uninstalled a working agent, deleted the namespace and performed a re-install and now it only installs kube-state-metrics and NOT the daemon set. I can't figure out why it's not installing the daemonset as my config is un-changed. I also tried using the default-values from the source repo still no luck. I don't get any errors or anything it just doesn't even try to install the daemonset. The only pod that runs is kube-state-metrics.

Steps to reproduce the issue:
datadog-values.yaml

datadog:
  agents:
    enabled: true
  apm:
    enabled: true
  logs:
    enabled: false
$ helm install datadog-agent datadog/datadog -f datadog_new_values.yaml --dependency-update --set targetSystem=linux --set datadog.apiKey=$DD_API_KEY

NAME: datadog-agent
LAST DEPLOYED: Thu Jan  7 12:40:11 2021
NAMESPACE: datadog
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Datadog agents are spinning up on each node in your cluster. After a few
minutes, you should see your agents starting in your event stream:
    https://app.datadoghq.com/event/stream

The Datadog Agent is listening on port 8126 for APM service.

$ kubectl get pods
NAME                                                READY   STATUS    RESTARTS   AGE
datadog-agent-kube-state-metrics-784796cf46-xvsms   1/1     Running   0          2m25s


**Additional environment details (Operating System, Cloud provider, etc):**

helm version
version.BuildInfo{Version:"v3.4.2", GitCommit:"23dd3af5e19a02d4f4baa5b2f242645a1a3af629", GitTreeState:"clean", GoVersion:"go1.14.13"}

kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.15-gke.4300", GitCommit:"7ed5ddc0e67cb68296994f0b754cec45450d6a64", GitTreeState:"clean", BuildDate:"2020-10-28T09:23:22Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}


[Q] How to specify which pods to collect logs from?

Current datadog-values.yaml logs setting:

  ## Enable logs agent and provide custom configs
  logs:
    # datadog.logs.enabled -- Enables this to activate Datadog Agent log collection
    ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
    enabled: true

    # datadog.logs.containerCollectAll -- Enable this to allow log collection for all containers
    ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
    containerCollectAll: true

    # datadog.logs.containerCollectUsingFiles -- Collect logs from files in /var/log/pods instead of using container runtime API
    ## It's usually the most efficient way of collecting logs.
    ## ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
    containerCollectUsingFiles: true

Desired result:

  • To include/exclude namespace/pods to collect logs from
  • To specify which level of logs to collect from a pod

Values.datadog.env is a list so can't be merge across multiple values files

We have the datadog chart installed across many clusters with varying configuration. We use a shared values file for comment settings and then some specific ones for each cluster or env.

The problem for us is that .Values.datadog.env is an array so can't be merged across values files. This means we have to copy common env vars across files leading to a lot of duplicated YAML.

Can I make a PR to accept both an map or an array for this value in the chart?

datadog chart version incompatibility in master

Describe what happened:

The file datadog/helm-charts/charts/datadog/Chart.yaml specifies apiVersion: v1, which Helm says means "use Helm v2" -- but this chart cannot build with Helm v2, I receive the error:

charts/datadog $ helm2 version -c
Client: &version.Version{SemVer:"v2.16.12", GitCommit:"47f0b88409e71fd9ca272abc7cd762a56a1c613e", GitTreeState:"clean"}

charts/datadog$ helm2 dependency build .               
Error: requirements.lock is out of sync with requirements.yaml

This works properly when built with v3:

charts/datadog$ helm version -c     
version.BuildInfo{Version:"v3.3.1", GitCommit:"249e5215cde0c3fa72e27eb7a30e8d55c9696144", GitTreeState:"dirty", GoVersion:"go1.15"}

charts/datadog$ helm dependency build .
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "banzaicloud-stable" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. โŽˆHappy Helming!โŽˆ
Saving 1 charts
Downloading kube-state-metrics from repo https://kubernetes-charts.storage.googleapis.com/
Deleting outdated charts

Describe what you expected:

I expect this chart to be correctly configured to utilize a Helm version that works. I think that means I expect the apiVersion in the Chart.yaml to be v2 to indicate that we ought to use Helm v3 to build with.

Steps to reproduce the issue:

Perform a helm dependency build on this chart using Helm V2.

Additional environment details (Operating System, Cloud provider, etc):

Under manual circumstances this isn't a big deal -- I can just use v3 of the Helm binary, despite what the Chart.yaml says. But this completely breaks down when automating this process (for example, deploying the datadog chart with ArgoCD). Automation means that we must assume that the Chart.yaml is correct.

Add ability to disable apparmor profiles

Describe what happened:

Apparmor is not supported on EKS. When deploying the helm chart with systemProbe enabled, and PSPs enabled, even after fixing the issue described in #35 (by adding unconfined as an allowed profile), the pods fail to start ok EKS because apparmor profiles aren't supported. My pods show a state of Blocked and I see

Reason:               AppArmor
Message:              Cannot enforce AppArmor: AppArmor is not enabled on the host

in kubectl describe

I can't see an easy way of disabling them in the chart. Happy to provide a pull request to fix this. Is an apparmorEnabled flag possible?

Describe what you expected:

I should be able to deploy the Helm chart without modifying resources manually.

Steps to reproduce the issue:

Deploy the Helm chart with datadog.systemProbe.enabled: true and agents.podSecurity.podSecurityPolicy.create: true and agents.podSecurity.apparmorProfiles: ["runtime/default", "unconfined"]

Additional environment details (Operating System, Cloud provider, etc):

EKS, k8s 1.17

Setting datadog.logLevel to DEBUG removes the DD_LOG_LEVEL env variable

Describe what happened:
Setting datadog.logLevel to DEBUG removed the DD_LOG_LEVEL env variable

Describe what you expected:
For DD_LOG_LEVEL to be set to DEBUG

Steps to reproduce the issue:
Using chart version 2.8.5 and the values:

datadog:
  apiKeyExistingSecret: ${kubernetes_secret.datadog_secret.metadata.0.name}
  appKeyExistingSecret: ${kubernetes_secret.datadog_secret.metadata.0.name}
  site: datadoghq.eu
  logLevel: DEBUG
  env:
    - name: DD_CONTAINER_EXCLUDE
      value: |-
        kube_namespace:kube-system
        kube_namespace:datadog
  kubeStateMetricsEnabled: true
  clusterChecks:
    enabled: true
  collectEvents: true
  logs:
    enabled: true
    containerCollectAll: true
  apm:
    enabled: true
  processAgent:
    enabled: true
    processCollection: true
  orchestratorExplorer:
    enabled: true
    container_scrubbing:
      enabled: false
  dogstatsd:
    port: 8125
    useHostPort: true
    nonLocalTraffic: true
clusterAgent:
  enabled: true
  token: ${random_password.datadog_cluster_agent_token.result}

Produces a deamon set without any reference to DD_LOG_LEVEL.

Additional environment details (Operating System, Cloud provider, etc):
Deploying to GKE using Terraform.

Not able to run custom metric check on a single agent on k8s

Describe what happened:

We are installing datadog through this helm chart on our Kubernetes clusters. Recently we got a requirement of adding a custom metric for which we added a python script in checksd section of values.yaml and corresponding confd for scheduling that check.

Everything works fine and we are able to export the custom metric but the check runs on all the agents which are running on the cluster.

Describe what you expected:

We expected the check to run once on a single cluster.

Is there a way to configure below section of confd for a custom check through this chart?

instances: [{}]

Steps to reproduce the issue:

We followed the official documentation to create the check and the override values.yaml looks like below:

https://docs.datadoghq.com/developers/write_agent_check/?tab=agentv6v7

Values.yaml

datadog:
    checksd:
      my_custom_check.py: |-

    confd:
      my_custom_check.yaml: |-
        init_config:
        instances: [{}]

**Additional environment details (Operating System, Cloud provider, etc):

OS: Linux

Cloud Provider: AWS.

Custom environment values not being double quoted

Describe what happened:
Values for custom environment variables passed to the main container-agent are not being quoted.

          - name: DD_LOGS_CONFIG_USE_HTTP
            value: true
          - name: DD_LOGS_CONFIG_LOGS_DD_URL
            value: pvtlink.logs.datadoghq.com:443

Describe what you expected:
These values to be double quoted as they were when sourcing the chart from the Helm Stable repo.

          - name: DD_LOGS_CONFIG_USE_HTTP
            value: "true"
          - name: DD_LOGS_CONFIG_LOGS_DD_URL
            value: "pvtlink.logs.datadoghq.com:443"

Steps to reproduce the issue:
helm template

Additional environment details (Operating System, Cloud provider, etc):
Helm 2.14.x

I looked through the commits on this repo and even tried the first release and none of them appear to quote correctly. Not really sure what's going on but this works fine with the chart sourced from the legacy helm stable repo.

A container name must be specified for pod

Describe what happened:
Follow the documentation https://docs.datadoghq.com/agent/kubernetes/?tab=helm just changed

 logs: 
    enabled: true
    containerCollectAll: true

I expected to have some logs to be forward to datadob, but nothing happens.

Looking at logs I get;

$kubectl logs datadog-9z7z6                              
error: a container name must be specified for pod datadog-9z7z6, choose one of: [agent process-agent] or one of the init containers: [init-volume init-config]

Describe what you expected:
Logs to be forward

Steps to reproduce the issue:
$helm upgrade --install datadog -f datadog-new.yaml --set datadog.site='datadoghq.com' --set datadog.apiKey=*** --set targetSystem=linux datadog/datadog

Additional environment details (Operating System, Cloud provider, etc):
Azure aks

[Private-Synthetics] Conditionally create config map

While installing the private synthetics chart, I noticed that I was forced to create a config map with the configuration. Since this contains secret information, I want to create this out of band (or using another tool that handles secrets). It would be nice if the chart allowed me to conditionally create the configMap (or potentially used a secrets so I could use external secrets for this).

Set hostname

Curious how we are supposed to set the hostname value for the agent pods? I'm getting errors like this in my logs because it's appending the clustername or something to the instance ip.

ValidHostname: ip-182-18-195-41.ec2.internal-prod_k8s_test is not RFC1123 compliant

Datadog not using created serviceaccount for horizontal-pod-autoscaler

Problem:

Creating a horizontal-pod-autoscaler object using a custom external metric and yeilds an error when doing a
kubectl describe hpa that references a different service account than the one listed in the chart. Something is not getting applied possibly and its defaulting to the user system:kube-proxy. All the installed yaml's look correct which this error is so confusing.

Expected:

Any authentication error should reference the service account which is listed in the rbac-hpa.yaml as
system:horizontal-pod-autoscaler

Steps:

Install Datadog Helm chart 2.4.10
Create an HPA using an external metric provider
run kubectl describe hpa

HPA Logs and Helmchart Installed Objects :

kubectl describe hpa -n default

Name:                                                                nginxext
Namespace:                                                           default
Labels:                                                              <none>
Annotations:                                                         CreationTimestamp:  Tue, 18 Aug 2020 11:43:19 -0400
Reference:                                                           Deployment/nginx
Metrics:                                                             ( current / target )
  "datadogmetric@nginx-demo:nginx-requests" (target average value):  <unknown> / 9
Min replicas:                                                        1
Max replicas:                                                        3
Deployment pods:                                                     1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetExternalMetric  the HPA was unable to compute the replica count: unable to get external metric 
  default/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:kube-proxy" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"
Events:
  Type     Reason                   Age                       From                       Message
  ----     ------                   ----                      ----                       -------
  Warning  FailedGetExternalMetric  2m12s (x1338 over 5h57m)  horizontal-pod-autoscaler  unable to get external metric default/datadogmetric@nginx-demo:nginx-requests/nil: unable to fetch metrics from external metrics API: datadogmetric@nginx-demo:nginx-requests.external.metrics.k8s.io is forbidden: User "system:kube-proxy" cannot list resource "datadogmetric@nginx-demo:nginx-requests" in API group "external.metrics.k8s.io" in the namespace "default"

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    helm.fluxcd.io/antecedent: monitoring:helmrelease/datadog
  creationTimestamp: "2020-04-21T18:07:18Z"
  labels:
    app.kubernetes.io/instance: datadog
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: datadog
    app.kubernetes.io/version: "7"
    helm.sh/chart: datadog-2.4.10
  name: datadog-cluster-agent-external-metrics-reader
  resourceVersion: "138600462"
  selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/datadog-cluster-agent-external-metrics-reader
  uid: bdc70805-1905-4e69-bf12-42c1ee2cffd5
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: datadog-cluster-agent-external-metrics-reader
subjects:
- kind: ServiceAccount
  name: horizontal-pod-autoscaler
  namespace: kube-system

  ---

kubectl get datadogmetric -n default
NAME             ACTIVE   VALID   VALUE   REFERENCES   UPDATE TIME
nginx-requests   False    False   0                    17h

index.yaml generated field not being updated

Describe what happened:

The generated field of this Helm repository's index.yaml is not being updated. Flux v2 expects this field to change. When it stays the same, Flux assumes there are no changes to the index and so doesn't pull a new index file with new releases.

Describe what you expected:

The generated field of the index should match the latest timestamp that the Helm repository was updated.

Steps to reproduce the issue:

$ curl -s https://helm.datadoghq.com/index.yaml | grep generated
generated: "2020-07-29T11:03:41.153796856Z"

Additional environment details (Operating System, Cloud provider, etc):

This is a bug in helm/chart-releaser. I've added a fix in this PR: helm/chart-releaser#99. Once that is merged, we will need to update to the latest release of the chart releaser Github action when it is released

Helm stable chart repository will be removed in the future

Hi,

Describe what happened:
Datadog chart depends on stable/kube-state-metrics chart. According to the deprecation announce, stable chart will be removed on Nov 13, 2020. I think it will cause a deployment failure.

Please let me know some workaround, for example, deploying kube-state-metrics from here or alternative chart.

Additional environment details (Operating System, Cloud provider, etc):
We are using Datadog chart with the following value:

datadog:
  kubeStateMetricsEnabled: true

Kubelet CA certificate location changes on AKS 1.16+

Describe what happened:
Datadog/datadog agent helm chart deploys successfully into an Azure Kubernetes Services instance but does not return data due to certificate validation error.

Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically: cannot set a valid kubelet host: cannot connect to kubelet using any of the given hosts: [XX.XXX.X.X] [aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net. aks-agentpool-XXXXXXXX-vmssXXXXX], Errors: [Get "https://XX.XXX.X.X:XXXXX/pods": x509: cannot validate certificate for XX.XXX.X.X because it doesn't contain any IP SANs Get "https://aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net.:XXXXX/pods": x509: certificate is valid for aks-agentpool-XXXXXXXX-vmssXXXXX, not aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net. Get "https://aks-agentpool-XXXXXXXX-vmssXXXXX:XXXXX/pods": x509: certificate signed by unknown authority cannot connect: http: "Get \"http://XX.XXX.X.X:XXXXX/\": dial tcp XX.XXX.X.X:XXXXX: connect: connection refused" cannot connect: http: "Get \"http://aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net.:XXXXX/\": dial tcp XX.XXX.X.X:XXXXX: connect: connection refused" cannot connect: http: "Get \"http://aks-agentpool-XXXXXX-XXXXX:XXXX/\": dial tcp XX.XXX.X.X:XXXXX: connect: connection refused"]", "traceback": "Traceback (most recent call last):\n File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 828, in run\n self.check(instance)\n File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 295, in check\n raise CheckException("Unable to detect the kubelet URL automatically: " + kubelet_conn_info.get('err', ''))\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically: cannot set a valid kubelet host: cannot connect to kubelet using any of the given hosts: [XX.XXX.X.X] [aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net. aks-agentpool-XXXXXXXX-vmssXXXXX], Errors: [Get "https://XX.XXX.X.X:XXXXX/pods": x509: cannot validate certificate for XX.XXX.X.X because it doesn't contain any IP SANs Get "https://aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net.:XXXXX/pods": x509: certificate is valid for aks-agentpool-XXXXXXXX-vmssXXXXX, not aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net. Get "https://aks-agentpool-XXXXXXXX-vmssXXXXX:XXXXX/pods": x509: certificate signed by unknown authority cannot connect: http: "Get \"http://XX.XXX.X.X:XXXXX/\": dial tcp XX.XXX.X.X:XXXXX: connect: connection refused" cannot connect: http: "Get \"http://aks-agentpool-XXXXXXXX-vmssXXXXX.internal.cloudapp.net.:XXXXX/\": dial tcp XX.XXX.X.X:XXXXX: connect: connection refused" cannot connect: http: "Get \"http://aks-agentpool-XXXXXX-XXXXX:XXXX/\": dial tcp XX.XXX.X.X:XXXXX: connect: connection refused"]\n"}]

Describe what you expected:
Would have hoped to not need to provide as many value.yaml overrides to support this environment. Does the helm deployment already have a conditional that checks for AKS environment? If not documenting here incase anyone else runs into it.

  # agents.volumes -- Specify additional volumes to mount in the dd-agent container
  volumes: 
     - name: k8s-certs 
       hostPath: 
         path: /etc/kubernetes/certs
         type: ''

  # agents.volumeMounts -- Specify additional volumes to mount in all containers of the agent pod
  volumeMounts: 
     - name: k8s-certs
       mountPath: /etc/kubernetes/certs
       readOnly: true

  # datadog.env -- Set environment variables for all Agents
  ## The Datadog Agent supports many environment variables.
  ## ref: https://docs.datadoghq.com/agent/docker/?tab=standard#environment-variables
  env: 
     - name: DD_KUBELET_CLIENT_CA
       value: /etc/kubernetes/certs/kubeletserver.crt

Additional environment details (Operating System, Cloud provider, etc):
Microsoft Azure AKS. Related Bug with solution - DataDog/datadog-agent#5942

Linting issue: object name does not conform to Kubernetes naming requirements

Describe what happened:

Currently when we helm lint the Chart, we can observe errors like for instance:
[ERROR] templates/containers-common-env.yaml: object name does not conform to Kubernetes naming requirements: "": invalid metadata name, must match regex ^(([A-Za-z0-9][-A-Za-z0-9.]*)?[A-Za-z0-9])+$ and the length must not longer than 253

This is due to some files in the templates/ folder without _ prefix name.

For instance: container-agent.yaml should be renamed: _container-agent.yaml

Describe what you expected:

1 chart(s) linted, 0 chart(s) failed

Additional environment details (Operating System, Cloud provider, etc):

Tested with Helm version: v3.3.4

I created this Pull Request to adress it: #135

Cluster Agent PDB blocks scaledown

Describe what happened:

I flipped the flag to create a PDB for the Cluster Agent.

It created a PDB with minAvailable: 1.

The cluster agent runs a single pod.

Cluster Autoscaler won't ever downscale a node where Cluster Agent lives, and I'll have to delete the pod manually to drain nodes for maintenance.

I saw no flags to allow running Cluster Agent with more than a single Pod.

Describe what you expected:

One of:

  • The ability to run more than one cluster agent, with leader election or something, so I can respect the bundled PDB and still downscale/drain
  • Some documentation on why the Cluster Agent's PDB prevents scaledowns
  • No PDB, if Cluster Agent scaledowns aren't the end of the world

Steps to reproduce the issue:

clusterAgent.createPodDisruptionBudget = true

Try to drain a node where Cluster Agent is installed.

Additional environment details (Operating System, Cloud provider, etc):

Ability to disable and configure autodiscovery in helm charts

Describe what happened:
I am running the agent in a k8s cluster using cluster agent and deployed via latest helm chart. The kube-state-metrics integration is autodiscovered on the masters but is improperly configured. I can not see a way using the helm chart to disable/reconfigure the autodiscovery configuration for the masters. I have tried the datadog.confd configuration, but this just places files into /etc/datadog-agent/conf.d/ . The autodiscovery directory still appears to have precedence for the configuration. IMO the datadog.confd should work by taking over the /etc/datadog-agent/conf.d/ directory . I have some changes I am working on to the helm template which work for us that I'd be happy to cleanup and submit via PR if it's the correct direction to take.

The one issue I am finding which has prevented me from submitting thus far is that I have changed the design such that the older datadog.autoconf support no longer can be merged with datadog.confd. My design is that each key for datadog.confd creates distinct configmap which gets mounted into the agent container at /etc/datadog-agent/conf.d//conf.yaml . This allows for overwriting the existing autoconf files/directories. Due to this design, the existing configmap which appears to merge the older datadog.autoconf values with the newer data.confd values will no longer work.

datadog chart: incompatibility with initContainers and dogstatsd image

Describe what happened:

  • Pods failed to run when using datadog/dogstatsd as the image repo (agents.image.repository=datadog/dogstatsd)
  • The problem is in the initContainers:
    • The initContainers (for linux) are hardcoded to use the image set in agents.image.repository and bash.
    • The datadog/dogstatsd image does not include bash.

Describe what you expected:

Pods would run as is if we use the datadog/dogstatsd image, or there would be a way to bypass initContainers or alter them for a release using dogstatsd

Steps to reproduce the issue:

Set datadog/dogstatsd as the agents image repository

Additional environment details (Operating System, Cloud provider, etc):

OS: linux

datadog-cluster-id configmap missing

Describe what happened:
container-process-agent.yaml contains a configmap reference to datadog-cluster-id that appears to not get created

Describe what you expected:
The configmap to be created by the chart or an init container (or similar)

Steps to reproduce the issue:
Enable the "orchestrator" variable and attempt to run the agent; will have configuration errors regarding the missing configmap

Additional environment details (Operating System, Cloud provider, etc):

How to run as a deployment instead of a daemonset?

Describe what happened:
On the README it is said:

By default, the Datadog Agent runs in a DaemonSet. It can alternatively run inside a Deployment for special use cases.

Describe what you expected:
I expected some kind of value do be able to run the agent as a Deployment instead of a replica

Steps to reproduce the issue:
I'll have to fork the whole chart to be able to run the agent as a deployment. Is there another way?

Additional environment details (Operating System, Cloud provider, etc):
AWS, EKS

rbac deprecation notice when installing helm chart

Describe what happened:
When I install the helm chart for the Datadog agent, I get a warning about the deprecation of the RBAC beta api.

W0126 17:35:02.320670   42468 warnings.go:67] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
W0126 17:35:02.358430   42468 warnings.go:67] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
W0126 17:35:02.396070   42468 warnings.go:67] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
W0126 17:35:02.660601   42468 warnings.go:67] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
W0126 17:35:02.698056   42468 warnings.go:67] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
W0126 17:35:02.736238   42468 warnings.go:67] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding

I had previously installed the Datadog agents via the daemonset. I think I got it all removed before trying to install the agents via helm. It installs fine other than the warning.

When I look at the source, it looks like Capabilities.KubeVersion.GitVersion is used to determine the API version and this issue that field is deprecated and I'm wondering if our cluster is new enough to not have that field.

Describe what you expected:
No warnings when I install/upgrade the helm chart

Steps to reproduce the issue:
helm install datadog-agent -f values.yaml --set datadog.apiKey= datadog/datadog

Additional environment details (Operating System, Cloud provider, etc):
Azure AKS 1.19.3 (Linux)

Missing network policies

Describe what happened:
Deployed to default namespace but could not communicate with datadog over the network

Describe what you expected:
Network policies to be created alongside the other infrastructure to allow communication where necessary between pods and with the outside world.

Steps to reproduce the issue:
Deploy with default deny all networking policy in same namespace and with some kubernetes networking set up such as calico.

Additional environment details (Operating System, Cloud provider, etc):
AWS EKS with Calico installed and set up.

Cluster agent pod spec should include container's runAsNonRoot security context

Describe what happened:

I'm trying to create a restrictive PodSecurityPolicy for the Datadog cluster agent. When I create the following policy:

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  annotations:
    kubernetes.io/description: 'Custom PSP for the datadog cluster agent'
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: runtime/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
  name: 001-datadog-cluster-agent
spec:
  allowPrivilegeEscalation: false
  allowedProcMountTypes:
    - Default
  fsGroup:
    rule: MustRunAs
    ranges:
      - max: 65535
        min: 1
  hostIPC: false
  hostNetwork: false
  hostPID: false
  hostPorts:
    - max: 65535
      min: 1025
  privileged: false
  readOnlyRootFilesystem: true
  requiredDropCapabilities:
    - AUDIT_CONTROL
    - AUDIT_READ
    - AUDIT_WRITE
    - BLOCK_SUSPEND
    - CHOWN
    - DAC_OVERRIDE
    - DAC_READ_SEARCH
    - FOWNER
    - FSETID
    - IPC_LOCK
    - IPC_OWNER
    - KILL
    - LEASE
    - LINUX_IMMUTABLE
    - MAC_ADMIN
    - MAC_OVERRIDE
    - MKNOD
    - NET_ADMIN
    - NET_BIND_SERVICE
    - NET_BROADCAST
    - NET_RAW
    - SETGID
    - SETFCAP
    - SETPCAP
    - SETUID
    - SYS_ADMIN
    - SYS_BOOT
    - SYS_CHROOT
    - SYS_MODULE
    - SYS_NICE
    - SYS_PACCT
    - SYS_PTRACE
    - SYS_RAWIO
    - SYS_RESOURCE
    - SYS_TIME
    - SYS_TTY_CONFIG
    - SYSLOG
    - WAKE_ALARM
  runAsGroup:
    rule: MustRunAs
    ranges:
      - max: 65535
        min: 1
  runAsUser:
    rule: RunAsAny # EXCEPTION: Pod needs to run as root
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: MustRunAs
    ranges:
      - max: 65535
        min: 1
  volumes:
    - configMap
    - downwardAPI
    - emptyDir
    - projected
    - secret
    - persistentVolumeClaim
---

I would expect this policy to be used by the cluster agent pod.

Describe what you expected:

Instead of using the above PSP, the cluster agent instead is assigned a more restricted PSP. The pod then fails to be created with the error:

state:
        waiting:
          message: container has runAsNonRoot and image will run as root
          reason: CreateContainerConfigError

Steps to reproduce the issue:

  1. Create the above PSP
  2. Create a PSP with the line with the # EXCEPTION comment above set to MustRunAsNonRoot, with a name that is lexicographically less than 001, eg, 000-restricted.

Additional environment details (Operating System, Cloud provider, etc):

I believe this is this issue: kubernetes/kubernetes#71787

The Pod securityContext is:

    securityContext:
      fsGroup: 1
      supplementalGroups:
      - 1

Where as the container security context is:

      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
          - AUDIT_CONTROL
          - AUDIT_READ
          - AUDIT_WRITE
          - BLOCK_SUSPEND
          - CHOWN
          - DAC_OVERRIDE
          - DAC_READ_SEARCH
          - FOWNER
          - FSETID
          - IPC_LOCK
          - IPC_OWNER
          - KILL
          - LEASE
          - LINUX_IMMUTABLE
          - MAC_ADMIN
          - MAC_OVERRIDE
          - MKNOD
          - NET_ADMIN
          - NET_BIND_SERVICE
          - NET_BROADCAST
          - NET_RAW
          - SETFCAP
          - SETGID
          - SETPCAP
          - SETUID
          - SYSLOG
          - SYS_ADMIN
          - SYS_BOOT
          - SYS_CHROOT
          - SYS_MODULE
          - SYS_NICE
          - SYS_PACCT
          - SYS_PTRACE
          - SYS_RAWIO
          - SYS_RESOURCE
          - SYS_TIME
          - SYS_TTY_CONFIG
          - WAKE_ALARM
        readOnlyRootFilesystem: true
        runAsGroup: 1
        runAsNonRoot: true

Since the pod spec is what is used to choose a PSP, the wrong PSP is chosen.

I don't see any reason why the Pod spec shouldn't match the container spec

DD_CONTAINER_INCLUDE not working

Hey guys,

I deployed the datadog helm chart, version 2.4.2.
And I just override the values file with those values:

datadog:
    apiKeyExistingSecret: datadog

    clusterName: kinto-dev

    logLevel: INFO

    kubeStateMetricsEnabled: false

    logs:
      enabled: true
      containerCollectAll: true

    containerInclude: "kube_namespace:kinto-*."

    processAgent:
      enabled: false

agents:
    enabled: true

    containers:
      agent:
        logLevel: INFO

However, it still push logs from all the namespaces to datadog.

Am I doing something wrong?

metricsProvider service port does not get registered with v1beta1.external.metrics.k8s.io inside cluster

Describe what happened:
when installing the helm chart with values.yaml and changing metricsProvider.service.port to something other than the default (443) the external metrics provider registration of the cluster agent does not get updated with the correct port

Describe what you expected:
the APIService v1beta1.external.metrics.k8s.io will be updated accordingly

Steps to reproduce the issue:
change metricsProvider.service.port to something other than default e.g 1555
install the helm chart on the cluster
no external metrics are getting received from cluster-agent
kubectl get APIService results in:
v1beta1.external.metrics.k8s.io default/datadog-cluster-agent-metrics-api False (ServicePortError)

Additional environment details (Operating System, Cloud provider, etc):
using chart version 2.3.10, but i have noticed this since at least chart version 2.2.9. kubernetes version 1.15-17
this results in us having to manually edit the APIService each new installation and having to remember to do that or else our hpa based on custom metrics does not work

kubectl get APIService v1beta1.external.metrics.k8s.io -oyaml

kind: APIService
metadata:
  creationTimestamp: "2020-08-20T07:52:23Z"
  labels:
    app.kubernetes.io/instance: datadog
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: datadog
    app.kubernetes.io/version: "7"
    helm.sh/chart: datadog-2.3.10
  name: v1beta1.external.metrics.k8s.io
  resourceVersion: "141754"
  selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.external.metrics.k8s.io
  uid: 4bfab8c3-567b-46e8-bc1f-ecf472b72eb9
spec:
  group: external.metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: datadog-cluster-agent-metrics-api
    namespace: default
    port: 443
  version: v1beta1
  versionPriority: 100
status:
  conditions:
  - lastTransitionTime: "2020-08-20T07:52:23Z"
    message: service/datadog-cluster-agent-metrics-api in "default" is not listening
      on port 443
    reason: ServicePortError
    status: "False"
    type: Available```

synthetics-private-location - How to pass in configFile

Describe what happened:

Generated the configuration from the DD website. Attempted to pass the config in directly using

configFile: "{}"

I pasted it exactly as it was presented, I then swapped the " to ', and then I totally removed the ". All of them results in some form of..

Unexpected token d in JSON at position 2

/home/dog/node_modules/yargs/yargs.js:1178
if (parseErrors) throw new YError(parseErrors.message || parseErrors)
^
YError: Unexpected token d in JSON at position 2
at Object.runValidation [as _runValidation] (/home/dog/node_modules/yargs/yargs.js:1178:28)
at Object.runCommand (/home/dog/node_modules/yargs/lib/command.js:226:36)
at Object.parseArgs [as _parseArgs] (/home/dog/node_modules/yargs/yargs.js:1095:28)
at Object.parse (/home/dog/node_modules/yargs/yargs.js:573:25)
at getConfigFromArgs (/home/dog/dist/config/config.js:15:38)
at Object. (/home/dog/dist/config/config.js:28:27)
at Module._compile (internal/modules/cjs/loader.js:1137:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1157:10)
at Module.load (internal/modules/cjs/loader.js:985:32)
at Function.Module._load (internal/modules/cjs/loader.js:878:14)
[dumb-init] Received signal 17.
[dumb-init] A child with PID 10 exited with exit status 1.
[dumb-init] Forwarded signal 15 to children.
[dumb-init] Child exited with status 1. Goodbye.

What should this look like?

cluster-agent fails with initialisation errors (You have to define a namespace for each prometheus check)

Describe what happened:
I upgraded my cluster-agent-deployment.yaml deployment to work with k8s v1.16.
Basically, I have a copy of https://www.datadoghq.com/blog/monitoring-kubernetes-with-datadog/ but with a custom different namespace.
All other parameters which were there didn't change.

The pods are restarting and I get the following error:

kubectl exec -it datadog-datadog-8mv9w  -n monitoring -- agent status

      prometheus (3.3.0)
      ------------------

      instance 0:

        could not invoke 'prometheus' python check constructor. New constructor API returned:
__init__() takes at least 4 arguments (4 given)Deprecated constructor API returned:
Traceback (most recent call last):
  File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/prometheus/base_check.py", line 97, in __init__
    self.get_scraper(instance)
  File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/prometheus/base_check.py", line 126, in get_scraper
    raise CheckException("You have to define a namespace for each prometheus check")
CheckException: You have to define a namespace for each prometheus check

  Config Errors
  ==============
    statsd_receiver
    ---------------
      Configuration file contains no valid instances

Describe what you expected:

As this part didn't change and there's nothing in the docs related to it, i expected it to work.

  • DaemonSet datadog is already upgraded to work with v1.16.
  • kube-state-metrics is already upgraded to work with v1.16.

Steps to reproduce the issue:

changed: apiVersion:

extensions/v1beta1 >> apiVersion: apps/v1
added:


  selector:
    matchLabels:
      app: kube-state-metrics
      release: datadog
  strategy:
    type: RollingUpdate

Additional environment details (Operating System, Cloud provider, etc):
AWS, EKS

I tried upgrading datadog-agent from 6.20.0 to 7.17.0 to 7.23.1, The error is still the same

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.