Coder Social home page Coder Social logo

terraform-datadog-monitors's Introduction

DataDog Monitors

Changelog Notice Apache V2 License

This repository aims to provide a base of generic and pre configured monitors for Datadog templated thanks to Terraform and the Datadog Provider.

Important notes

  • This repository provide multiple Terraform modules which could be imported, you must choose the one(s) you need.
  • Each of these modules contains the most commons monitors, but they probably do not fulfill all your needs.
  • You still can create some specific DataDog monitors after importing a module, it's even advisable to complete your needs.
  • You will find a complete README.md on each module, explaining how to use it and its specificities if there.
  • The alerting-message module could be used to easily generate a templating message to re-use and could be used multiple times to suit different use cases.
  • Some monitors are disabled by default because not generic or "plug and play" enough, if you use them you will need to tweak them or in some cases disabled another one which could "duplicate" the check.

Getting started

Versions

Here are the minimum versions required to use these modules of integrations.

terraform {
  required_providers {
    datadog = {
      source = "DataDog/datadog"
      version = ">= 3.1.2"
    }
  }
  required_version = ">= 0.12.31"
}

Note: if you want to use Datadog provider v2, you need to use version 3 of the modules in this repository.

DataDog provider

Here is the last tester terraform provider version for datadog but next versions should work too.

provider "datadog" {
  api_key = var.datadog_api_key
  app_key = var.datadog_app_key
}

Both of the datadog_api_key and datadog_app_key are unique to the each datadog account. You can define them in terraform.tfvars file:

datadog_api_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
datadog_app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Variables

Some variables need to be declared.

variable "environment" {
  type    = string
  default = "dev"
}

variable "datadog_api_key" {
  type = string
}

variable "datadog_app_key" {
  type = string
}

Modules declaration example

A quick example of alerting message module declaration:

locals {
  oncall_24x7         = "@pagerduty-MyPagerService_NBH"
  oncall_office_hours = "@pagerduty-MyPagerService_BH"
}

module "datadog-message-alerting" {
  source = "claranet/monitors/datadog//common/alerting-message"
  version = "{revision}"

  message_alert   = local.oncall_24x7
  message_warning = local.oncall_office_hours
  message_nodata  = local.oncall_24x7
}

module "datadog-message-alerting-bh-only" {
  source = "claranet/monitors/datadog//common/alerting-message"
  version = "{revision}"

  message_alert   = local.oncall_office_hours
  message_warning = local.oncall_office_hours
  message_nodata  = local.oncall_office_hours
}

module "datadog-monitors-system-generic" {
  source = "claranet/monitors/datadog//system/generic"
  version = "{revision}"

  environment = var.environment
  message     = module.datadog-message-alerting.alerting-message

  memory_message = module.datadog-message-alerting-bh-only.alerting-message
  # Use variables to customize monitors configuration
}

# Other monitors modules to declare ...
#module "datadog-monitors-my-monitors-set" {
#  source = "claranet/monitors/datadog//my/monitors/set"
#  version = "{revision}"
#
#  environment = var.environment
#  message     = module.datadog-message-alerting.alerting-message
#}

  • Replace {revision} to the last git tag available on this repository.
  • The // is very important, it's a terraform specific syntax used to separate git url and folder path.
  • my/monitors/set represents the path to a monitors set sub directory listed below.

Contributions

Contributions are always welcome.

The easiest way is to fork the repository, duplicate a module as "template" and work on it.

An internal CI will run the auto_update.sh script to compare with proposed changes and check if everything is up to date.

So, when PR is ready you will need to run this script and push its changes to pass the CI, see scripts repository for more information.

For example, this will regenerate every READMEs thanks to terraform-docs currently in v0.9.1.

Monitors summary

terraform-datadog-monitors's People

Contributors

abrefort avatar adrienne-cln avatar aohzan avatar bzspi avatar clickboxer avatar ddrugeon avatar djagoudel-claranet avatar francovp avatar gati0 avatar hugueslepesant avatar ignacio-rivas avatar j-mx avatar jmapro avatar jmleblanc avatar jnancel avatar jphilaine avatar nsenaud avatar pdecat avatar polremy avatar pygillier avatar rafael-romero-carmona-claranet avatar rossifumax avatar shr3ps avatar xp-1000 avatar zfiel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-datadog-monitors's Issues

Update to DD 3.1 and TF 1.0

It would be great if you could do the update to both the Datadog and Terraform 1.0.

ps. Issue templates would be a big addition for GitHub.

400 Bad Request: "Legacy event alert monitors (type: 'event alert') are no longer supported"

Hello again, I'm getting this error in latest version 4.2.x using caas/kubernetes/node monitor

│ Error: error validating monitor from https://api.datadoghq.com/api/v1/monitor/validate: 400 Bad Request: {"errors": ["Legacy event alert monitors (type: 'event alert') are no longer supported, please create a new monitor with 'event-v2 alert' type or use the UI."]}
│ 
│   with module.k8s-datadog-monitor.module.datadog-monitors-caas-kubernetes-node.datadog_monitor.unregister_net_device[0],
│   on .terraform/modules/k8s-datadog-monitor.datadog-monitors-caas-kubernetes-node/caas/kubernetes/node/monitors-k8s-node.tf line 164, in resource "datadog_monitor" "unregister_net_device":164: resource "datadog_monitor" "unregister_net_device" {

Example code:

module "datadog-monitors-caas-kubernetes-node" {
  source  = "claranet/monitors/datadog//caas/kubernetes/node"
  version = "4.2.2"

  environment = "prod"
  message       = module.datadog-message-alerting.alerting-message
}

Apparently, is now needed to migrate the unregister_net_device kubernetes node monitor to type event-v2 alert

Provider registry.terraform.io/hashicorp/template v2.2.0 does not have a package available for your current platform, darwin_arm64.

`$ terraform init
Initializing modules...
Downloading registry.terraform.io/claranet/monitors/datadog 4.1.0 for datadog-message-alerting...

  • datadog-message-alerting in .terraform/modules/datadog-message-alerting/common/alerting-message

Initializing the backend...

Initializing provider plugins...

  • Reusing previous version of datadog/datadog from the dependency lock file
  • Reusing previous version of hashicorp/vault from the dependency lock file
  • Reusing previous version of hashicorp/template from the dependency lock file
  • Installing datadog/datadog v3.12.0...
  • Installed datadog/datadog v3.12.0 (signed by a HashiCorp partner, key ID FB70BE941301C3EA)
  • Installing hashicorp/vault v3.7.0...
  • Installed hashicorp/vault v3.7.0 (signed by HashiCorp)

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

│ Error: Incompatible provider version

│ Provider registry.terraform.io/hashicorp/template v2.2.0 does not have a package available for your current platform, darwin_arm64.

│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms
│ supported.
╵`

The provider template is deprecated since terraform 0.12, therefore it can't be pulled for ARM64 version of recent version of terraform.

Remove default team tag (or overwrite it)

Hi,

First of all, thank you for this great, well done and complete TF module to create Datadog monitors based on resources tags ! I'm really happy to find it :)

My issue :
I'm trying to use your module to create different monitors, but i noticed we can add tags (using "X_extra_tags" parameter), with the same name, but without the possibility to remove the default "team:claranet" tag or overwriting it, or I didn't find how to do it, can someone help me ? Or tell me what I missed :/

Example code, to reproduce :

module "datadog-monitors-cloud-aws-rds-common-xxx" {
  source  = "claranet/monitors/datadog//cloud/aws/rds/common" // AWS RDS PostgreSQL DB
  version = "4.6.0"

  environment = "PRD"
  prefix_slug = "ABC"
  message     = "test - message RDS alert"

  // filter_tags_custom 	Tags used for custom filtering when filter_tags_use_defaults is false 	string 	"*" 	no
  // filter_tags_custom_excluded 	Tags excluded for custom filtering when filter_tags_use_defaults is false 	string 	"" 	no
  // filter_tags_separator 	Set the filter tags separator (, or AND) 	string 	"," 	no
  filter_tags_use_defaults = false
  filter_tags_custom       = "name:xxx" // "team:ABC" OR "*"

  # CPU
  cpu_enabled            = true
  cpu_extra_tags         = ["team:ABC-ops", "service:subteam01"]
  cpu_message            = "cpu_message2"
  cpu_threshold_critical = "90" // CPU usage in percent (critical threshold)
  cpu_threshold_warning  = "80" // CPU usage in percent (warning threshold)
  # Diskspace
  diskspace_enabled            = true
  diskspace_extra_tags         = ["team:ABC-ops", "service:subteam01"]
  diskspace_message            = "diskspace_message"
  diskspace_threshold_critical = 10 // Disk free space in percent (critical threshold)
  diskspace_threshold_warning  = 20 // Disk free space in percent (warning threshold)
  diskspace_time_aggregator    = "min"
  diskspace_timeframe          = "last_5m"
  evaluation_delay             = 300
  # Replicalag
  replicalag_enabled    = false
  replicalag_extra_tags = ["team:ABC-ops", "service:subteam01"]
  # Connection Variance
  connection_variance_enabled    = false
  connection_variance_extra_tags = ["team:ABC-ops", "service:subteam01"]
}

With this code example, I can see this on Datadog Platform :
image

Do we have a way to simply get ride of this "team:claranet" tag to put our team tag instead ? :)

Thank you !

disk_queue and connection monitors

Ended up using your template style to create a few extra monitors, if you're interested in adding to your standard setup. The postgres one is technically a generic RDS one.

resource "datadog_monitor" "postgresql_disk_queue_depth" {
  count   = var.postgresql_disk_queue_enabled ? 1 : 0
  name    = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] PostgreSQL disk queue depth {{#is_alert}}{{{comparator}}} {{threshold}} ({{value}}){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ({{value}}){{/is_warning}}"
  message = coalesce(var.postgresql_disk_queue_message, var.message)
  type    = "query alert"

  query = <<EOQ
    ${var.postgresql_disk_queue_aggregator}(${var.postgresql_disk_queue_timeframe}):
      default(avg:aws.rds.disk_queue_depth${module.filter-tags.query_alert} by {server}, 0)
    > ${var.postgresql_disk_queue_threshold_critical}
EOQ

  monitor_thresholds {
    warning  = var.postgresql_disk_queue_threshold_warning
    critical = var.postgresql_disk_queue_threshold_critical
  }

  evaluation_delay    = var.evaluation_delay
  new_host_delay      = var.new_host_delay
  notify_no_data      = false
  renotify_interval   = 0
  require_full_window = true
  timeout_h           = 0
  include_tags        = true

  tags = concat(["env:${var.environment}", "type:database", "provider:postgres", "resource:postgresql", "team:claranet", "created-by:terraform"], var.postgresql_disk_queue_extra_tags)
}


resource "datadog_monitor" "rds_connection_variance" {
  count   = var.connection_variance_enabled ? 1 : 0
  name    = "${var.prefix_slug == "" ? "" : "[${var.prefix_slug}]"}[${var.environment}] RDS connection variance {{#is_alert}}{{{comparator}}} {{threshold}} ms ({{value}}ms){{/is_alert}}{{#is_warning}}{{{comparator}}} {{warn_threshold}} ms ({{value}}ms){{/is_warning}}"
  message = coalesce(var.connection_variance_message, var.message)
  type    = "query alert"

  query = <<EOQ
  ${var.connection_variance_time_aggregator}(${var.connection_variance_timeframe}): (
    anomalies(avg:aws.rds.database_connections${module.filter-tags.query_alert} by {dbinstanceidentifier}, 'agile', 1, 
      direction='both', 
      alert_window='last_15m', 
      interval=60, 
      count_default_zero='true',
      seasonality='weekly')
  ) > ${var.connection_variance_threshold_critical}
EOQ

  monitor_thresholds {
    warning  = var.connection_variance_threshold_warning
    critical = var.connection_variance_threshold_critical
  }

  evaluation_delay    = var.evaluation_delay
  new_host_delay      = var.new_host_delay
  notify_no_data      = false
  notify_audit        = false
  timeout_h           = 0
  include_tags        = true
  locked              = false
  require_full_window = false

  tags = concat(["env:${var.environment}", "type:cloud", "provider:aws", "resource:rds", "team:claranet", "created-by:terraform"], var.connection_variance_extra_tags)
}
#################################
###   RDS connection variance ###
#################################

variable "connection_variance_enabled" {
  description = "Flag to enable RDS connection variance monitor"
  type        = bool
  default     = true
}

variable "connection_variance_timeframe" {
  description = "Monitor timeframe for RDS connection variance monitor [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
  type        = string
  default     = "last_4h"
}

variable "connection_variance_threshold_warning" {
  description = "connection variance (warning threshold)"
  default     = "0"
}

variable "connection_variance_threshold_critical" {
  description = "replica lag in seconds (critical threshold)"
  default     = "1"
}

variable "connection_variance_extra_tags" {
  description = "Extra tags for RDS connection variance monitor"
  type        = list(string)
  default     = []
}
variable "connection_variance_message" {
  description = "Custom message for RDS CPU usage monitor"
  type        = string
  default     = ""
}

variable "connection_variance_time_aggregator" {
  description = "Monitor aggregator for connection variance [available values: min, max or avg]"
  type        = string
  default     = "avg"
}


#################################
###   PostgreSQL disk queue   ###
#################################

variable "postgresql_disk_queue_aggregator" {
  description = "Monitor time aggregator for PostgreSQL disk queue depth [available values: min, max or avg]"
  type        = string
  default     = "avg"
}

variable "postgresql_disk_queue_timeframe" {
  description = "Monitor timeframe for PostgreSQL disk queue depth [available values: `last_#m` (1, 5, 10, 15, or 30), `last_#h` (1, 2, or 4), or `last_1d`]"
  type        = string
  default     = "last_15m"
}

variable "postgresql_disk_queue_threshold_critical" {
  default     = 64
  description = "Maximum critical acceptable number of locks"
}

variable "postgresql_disk_queue_threshold_warning" {
  default     = 48
  description = "Maximum warning acceptable number of disk queue"
}

variable "postgresql_disk_queue_enabled" {
  description = "Flag to enable PostgreSQL disk queue"
  type        = bool
  default     = true
}

variable "postgresql_disk_queue_extra_tags" {
  description = "Extra tags for PostgreSQL lock connects monitor"
  type        = list(string)
  default     = []
}

variable "postgresql_disk_queue_message" {
  description = "Custom message for PostgreSQL disk queue monitor"
  type        = string
  default     = ""
}

400 Bad Request error: The value provided for parameter 'query' is invalid

Hello, I'm getting this errors in version 4.2.x, doesn't happen in version 4.1.2

Error: error validating monitor from https://api.datadoghq.com/api/v1/monitor/79946009/validate: 400 Bad Request: {"errors": ["The value provided for parameter 'query' is invalid"]}
│ 
│   with module.k8s-datadog-monitor.module.datadog-monitors-caas-kubernetes-pod.datadog_monitor.pod_phase_status[0],
│   on .terraform/modules/k8s-datadog-monitor.datadog-monitors-caas-kubernetes-pod/caas/kubernetes/pod/monitors-k8s-pod.tf line 1, in resource "datadog_monitor" "pod_phase_status":
│    1: resource "datadog_monitor" "pod_phase_status" {
│ 
╵
╷
│ Error: error validating monitor from https://api.datadoghq.com/api/v1/monitor/79946012/validate: 400 Bad Request: {"errors": ["The value provided for parameter 'query' is invalid"]}
│ 
│   with module.k8s-datadog-monitor.module.datadog-monitors-caas-kubernetes-pod.datadog_monitor.error[0],
│   on .terraform/modules/k8s-datadog-monitor.datadog-monitors-caas-kubernetes-pod/caas/kubernetes/pod/monitors-k8s-pod.tf line 30, in resource "datadog_monitor" "error":
│   30: resource "datadog_monitor" "error" {
│ 
╵
╷
│ Error: error validating monitor from https://api.datadoghq.com/api/v1/monitor/79946008/validate: 400 Bad Request: {"errors": ["The value provided for parameter 'query' is invalid"]}
│ 
│   with module.k8s-datadog-monitor.module.datadog-monitors-caas-kubernetes-pod.datadog_monitor.terminated[0],
│   on .terraform/modules/k8s-datadog-monitor.datadog-monitors-caas-kubernetes-pod/caas/kubernetes/pod/monitors-k8s-pod.tf line 60, in resource "datadog_monitor" "terminated":
│   60: resource "datadog_monitor" "terminated" {

My code:

module "datadog-monitors-caas-kubernetes-pod" {
  source = "claranet/monitors/datadog//caas/kubernetes/pod"
  version = "4.2.1"

  environment = "prod"
  message       = module.datadog-message-alerting.alerting-message

  prefix_slug = "FIF"
  filter_tags_use_defaults = false
  filter_tags_custom         = "cluster_name:${var.tag_cluster_name}"
}

Maybe 11816cf5 could be the problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.