datadog / terraform-provider-datadog Goto Github PK
View Code? Open in Web Editor NEWTerraform Datadog provider
Home Page: https://www.terraform.io/docs/providers/datadog/
License: Mozilla Public License 2.0
Terraform Datadog provider
Home Page: https://www.terraform.io/docs/providers/datadog/
License: Mozilla Public License 2.0
https://docs.datadoghq.com/api/#monitor-create the API allows you to create a composite
monitor the plugin should support that too
This issue was originally opened by @teeebs as hashicorp/terraform#13920. It was migrated here as part of the provider split. The original body of the issue is below.
Datadog now supports read-only user accounts. Adding them via the API is done by passing access_role
with the following options:
API docs for reference: http://docs.datadoghq.com/api/#users
Thanks!
Wiring datadog and pagerduty together can be somewhat tedious: I wish terraform could help.
It looks like @rlhh has done some work at https://github.com/terraform-providers/terraform-provider-datadog/pull/54 but I'm not sure how mature it is.
https://github.com/terraform-providers/terraform-provider-datadog/issues/41 is affecting my team right now, causing our plans to fill up with things that are never applied.
The root cause of this is that our oncalls may mute monitors from the Datadog UI, and then the next time we generate a plan, terraform intends to remove these mutes, but can't.
Even if #41 is fixed, we'd like to enable muting-in-the-datadog-UI to just work without getting written by terraform apply
. So my team's idea is to remove application of the silenced
argument, with a ignore_silences
argument.
Silencing a Datadog monitor has a couple of parameters, what you're silencing, and for how long. (ref the resource docs) What I'd want to ignore is mutes of a specific scope.
resource "datadog_monitor" "my_service_monitor" {
silenced {
"service:A" = "0"
}
ignore_silences = ["service:B", "*"]
}
What this example would do is:
service:A
is muted forever. Terraform would get the monitor into that state (pending #41).service:B
or *
(the generic scope) is muted, either permanently or temporarily. Never try to apply changes to these scopes, or show them in the plan.This would enable an oncall to mute the entire monitor for a brief time (e.g. if they're doing planned downtime) or mute service:B
for a long time, while it's under development.
current
Please list the resources as a list, for example:
datadog_provider
This issue was originally opened by @eredi93 as hashicorp/terraform#14205. It was migrated here as part of the provider split. The original body of the issue is below.
there is no way to validate the query
in a datadog_monitor
.
would be great if there wold be a data source
that generates the query avoiding missing parenthesis, parameters etc.
Resource datadog_monitor doesn't support composite monitors: https://help.datadoghq.com/hc/en-us/articles/207529246-Creating-Composite-Monitors
Terraform v0.9.8
# composite_monitors.tf
provider "datadog" {
api_key = "..."
app_key = "..."
}
resource "datadog_monitor" "monitor_a" {
...
}
resource "datadog_monitor" "monitor_b" {
...
}
resource "datadog_monitor" "monitor_c" {
query = "${datadog_monitor.monitor_a.id} && ${datadog_monitor.monitor_b.id}"
...
}
$ terraform apply
datadog_monitor.monitor_a: Refreshing state... (ID: 2219429)
datadog_monitor.monitor_b: Refreshing state... (ID: 2219420)
datadog_monitor.monitor_c: Creating...
evaluation_delay: "" => "<computed>"
include_tags: "" => "true"
message: "" => "Monitor C triggered"
name: "" => "Monitor C"
new_host_delay: "" => "<computed>"
notify_no_data: "" => "false"
query: "" => "2219420 && 2219429"
renotify_interval: "" => "60"
require_full_window: "" => "true"
silenced.%: "0" => "1"
silenced.*: "" => "0"
type: "" => "query alert"
Error applying plan:
1 error(s) occurred:
* datadog_monitor.monitor_c: 1 error(s) occurred:
* datadog_monitor.monitor_c: error updating monitor: API error 400 Bad Request: {"errors":["The value provided for parameter 'query' is invalid"]}
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
Monitor C should be created as created by web console: https://app.datadoghq.com/monitors#create/composite, and exported JSON is
{
"name": "Monitor C",
"type": "composite",
"query": "2219420 && 2219429",
"message": " @webhook-TEST",
"tags": [],
"options": {
"renotify_interval": 0,
"escalation_message": "",
"locked": false,
"notify_audit": false,
"include_tags": false,
"notify_no_data": false
}
}
* datadog_monitor.monitor_c: error updating monitor: API error 400 Bad Request: {"errors":["The value provided for parameter 'query' is invalid"]}
I'm attempting to create DataDog timeboards from terraform.
I'm exporting the definitions using the DataDog API and parsing the output to generate terraform config.
It seems that the markers section in the graph definition block supports some field names that are not included in terraform, eg:
"markers": [
{
"max": null,
"min": 1,
"type": "error solid",
"value": "y > 1.0",
"dim": "y"
},
{
"max": 1,
"min": 0.8,
"type": "warning solid",
"value": "0.8 < y < 1.0",
"dim": "y"
},
{
"max": 0.8,
"min": 0,
"type": "ok dashed",
"value": "0 < y < 0.8",
"dim": "y"
}
]
Is this just a case of the provider not keeping up with changes in DataDog capabilities?
datadog_monitor
Start with something like this:
resource "datadog_monitor" "foo" {
name = "..."
type = "metric alert"
...
# The significant bit.
silenced {
"*" = 0
}
}
then remove the silenced
argument:
resource "datadog_monitor" "foo" {
name = "..."
type = "metric alert"
...
}
Terraform appears to remove the silence but this never actually happens:
module.redacted.datadog_monitor.redacted_a: Modifying... (ID: 3724916)
silenced.%: "1" => "0"
silenced.*: "0" => ""
module.redacted.datadog_monitor.redacted_b: Modifying... (ID: 3724915)
silenced.%: "1" => "0"
silenced.*: "0" => ""
module.redacted.datadog_monitor.redacted_c: Modifying... (ID: 3724914)
silenced.%: "1" => "0"
silenced.*: "0" => ""
module.redacted.datadog_monitor.redacted_d: Modifying... (ID: 3724917)
silenced.%: "1" => "0"
silenced.*: "0" => ""
module.redacted.datadog_monitor.redacted_c: Modifications complete after 1s (ID: 3724914)
module.redacted.datadog_monitor.redacted_d: Modifications complete after 1s (ID: 3724917)
module.redacted.datadog_monitor.redacted_a: Modifications complete after 1s (ID: 3724916)
module.redacted.datadog_monitor.redacted_b: Modifications complete after 1s (ID: 3724915)
Apply complete! Resources: 0 added, 4 changed, 0 destroyed.
Subsequent terraform plan
invocations show the resources as out-of-sync, and subsequent terraform apply
invocations end in the same result.
Workaround:
I've been experimenting with PRing the ability to add count
to the datadog_timeboard
graphs
in the same way you do terraform resources to make the below example work. I later found the majority of the count logic seems to live in the main terraform executable rather then in each provider so I found myself stuck.
I feel this would be a useful feature. Is there another way to accomplish this? Or could someone point me in the right direction to put make the addition myself please?
Terraform v0.11.2
Please list the resources as a list, for example:
What I'd like to accomplish.
variable "my_list" {
default = ["First", "Second", "Third"]
}
resource "datadog_timeboard" "my_timeboard" {
title = "My Timeboard"
description = "My Description"
read_only = true
graph {
count = "${length(var.my_list)}"
title = "${element(var.my_list, count.index)}"
viz = "timeseries"
request {
q = "anomalies(sum:mycount{adapter:${element(var.my_list, count.index)}}.as_count().rollup(sum, 3600), 'robust', 4, direction='below')"
}
}
}
This issue was originally opened by @hryamzik as hashicorp/terraform#18226. It was migrated here as a result of the provider split. The original body of the issue is below.
Terraform v0.11.5
+ provider.datadog v1.0.3
resource "datadog_monitor" "disk" {
name = "Disk"
type = "metric alert"
query = "avg(last_5m):avg:system.disk.in_use{device:/dev/ad1s1} > ${datadog_monitor.disk.thresholds.critical}"
message = "test"
thresholds {
warning = 0.1
critical = 0.7
}
}
Error: datadog_monitor.disk: datadog_monitor.disk: self reference not allowed: "datadog_monitor.disk.thresholds.critical"
Working without errors... The problem here is that critical
must mach value in a queue so currently it has to be updated in two places for the same resource.
terraform init
terraform apply
0.11.1
DD provider 1.0.2
datadog_timeboard
graph.0.style.palette_flip: "false" => "0"
Should not apply a change.
Detected a diff and applied a change. Appears like #3
Terraform v0.10.7
Failed new testcase for setting palette_flip = true
[Failed testcase after attempting to fix issue by using a map[string]interface instead of a map[string]interface{}(https://gist.github.com/dustinlindquist/f93cea9d87e74b3f763dcd43575cdbf6)
If include a style.palette_flip of true or false in a call to update a datadog timeboard:
style {
palette_flip = true
}
I would expect the call to datadog to include this field and change it respectively.
The palette_flip field is not modified and remains defaulted to false.
Please list the steps required to reproduce the issue, for example:
add the graph.style.palette_flip = true to "config2" in the resource_datadog_timeboard_test.go
add a valid expectation to the step2.Checks
run the tests and observe the failure.
One can also reproduce this by attempting to set the palette_flip field to true on any hostmap for any timeboard they're trying to update.
The cause of this appears to be that the graph.style field is a map[string]string{}, forcing the need for the boolean field to be passed as a string. Upon changing it to a map[string]interface{}{} the issue then appears to be related to the 'WeaklyTypeInput' config option from mitchellh/mapstrucutre
that field defaults to false when the decoder is initialized
locally if I set that decoder config to:
config := &DecoderConfig{
Metadata: nil,
Result: rawVal,
WeaklyTypedInput: true,
}
then my fix of the graph.style being a map of interfaces instead of a map of strings works.
I've thrown up a branch with my work: https://github.com/dustinlindquist/terraform-provider-datadog/tree/issue29-palette_flip-as-bool
cc @mitchellh
Hi there,
Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.
Run terraform -v
to show the version. If you are not running the latest version of Terraform, please upgrade because your issue may have already been fixed.
Please list the resources as a list, for example:
If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.
Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.
If Terraform produced a panic, please provide a link to a GitHub Gist containing the output of the crash.log
.
What should have happened?
What actually happened?
Please list the steps required to reproduce the issue, for example:
terraform apply
Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?
Are there any other GitHub issues (open or closed) or Pull Requests that should be linked here? For example:
Hello!
Terraform v0.11.8
terraform {
required_version = "0.11.8"
}
resource "datadog_monitor" "monitor" {
type = "metric alert"
name = "nikita-test"
message = "nikita-test"
query = "avg(last_30m):avg:aws.ec2.cpuutilization{*} by {host} > 90"
notify_no_data = false
}
https://gist.github.com/artyushov/886685baaa1d7b6dc92ecd8df4ceba9b
The plan diff should include line new_host_delay: "1800" => "300"
, because 300 is the default value for that field.
new_host_delay
field is ignored. If this field is set explicitly to 300 in configuration file, the plan will show the required diff.
new_host_delay
field is not updated.Terraform v0.11.1
datadog_monitor
Please see the "silence" parameter. There is where I accidentally passed in a list where it should have been a map. The actual variable var.silenced is a map.
variable "silenced" {
type = "map"
default = {}
}
resource "datadog_monitor" "monitor" {
name = "${data.template_file.env_label.rendered}${var.name}"
type = "${var.type}"
message = "${var.message}"
query = "${var.query}"
include_tags = "${var.include_tags}"
require_full_window = "${var.require_full_window}"
locked = "${var.locked}"
no_data_timeframe = "${var.no_data_timeframe}"
notify_audit = "${var.notify_audit}"
notify_no_data = "${var.notify_no_data}"
renotify_interval = "${var.renotify_interval}"
timeout_h = "${var.timeout_h}"
thresholds = "${var.thresholds}"
silenced = ["${var.silenced}"]
tags = ["${var.tags}"]
}
https://gist.github.com/rafe-delphix/b849d062840d458d975df30d8d4dc173
https://gist.github.com/rafe-delphix/d0132ecf18ef07a7668bbe8d55cd0f7c
Generate error message for user
Crash with message on how to make crash report.
terraform plan
Terraform v0.11.7
+ provider.aws v1.8.0
+ provider.datadog v1.0.3
Please list the resources as a list, for example:
Trying to take the JSON config from a dashboard I created in Datadog, and port it to HCL so that I can manage it with Terraform. It appears that the metadata
key is not supported yet by this provider.
The JSON config for Datadog is:
{
"requests": [
{
"q": "sum:aws.trustedadvisor.green_checks, sum:aws.trustedadvisor.yellow_checks, sum:aws.trustedadvisor.red_checks",
"type": "bars",
"style": {
"palette": "dog_classic",
"type": "solid",
"width": "normal"
},
"metadata": {
"sum:aws.trustedadvisor.green_checks": {
"alias": "OK"
},
"sum:aws.trustedadvisor.yellow_checks": {
"alias": "WARN"
},
"sum:aws.trustedadvisor.red_checks": {
"alias": "ERROR"
}
},
"conditional_formats": []
}
],
"viz": "timeseries",
"autoscale": true
}
$ terraform version
Terraform v0.11.3
+ provider.datadog v1.0.3
+ provider.template v1.0.0
When the query parameter in a monitor / timeboard / etc isn't well formed the terraform validate step should fail.
When there are unbalanced parenthesis in the query parameter of a monitor the terraform validate and terraform plan tasks aren't failed and only upon terraform apply (hitting the DD API does it fail).
Please list the steps required to reproduce the issue, for example:
resource "datadog_monitor" "my-monitor-monitor" {
name = "My Monitor Name"
type = "query alert"
message = "${data.template_file.my-monitor.rendered}"
query = "min(last_5m):sum:counter.error{mytag:mytagvalue} by {environment}.as_count() / ( sum:counter.error{mytag:mytagvalue} by {environment}.as_count() + sum:counter.success{mytag:mytagvalue} by {environment}.as_count() ) ) * 100 > ${var.my-monitor-critical}"
thresholds {
warning = "${var.warning}"
critical = "${var.critical}"
}
require_full_window = false
notify_no_data = false
renotify_interval = 15
notify_audit = false
timeout_h = 0
include_tags = true
tags = ["terraform"]
}
data "template_file" "my-monitor" {
template = "${file("${path.module}/templates/my-monitor.tpl")}"
vars {
notification_channel = "${var.pager_duty}"
}
}
terraform validate
(passes)terraform plan
(passes)terraform apply
(fails with below)* module.datadog_monitor.my-monitor-monitor: 1 error(s) occurred:
* datadog_monitor.my-monitor-monitor: error updating monitor: API error 400 Bad Request: {"errors":["The value provided for parameter 'query' is invalid"]}
The problem in the query is the unbalanced parenthesis so a working version of the monitor should have this instead... and I believe there should be some syntax / well formed checks in the validate stage. Below is the corrected monitor query.
query = "min(last_5m):sum:counter.error{mytag:mytagvalue} by {environment}.as_count() / ( sum:counter.error{mytag:mytagvalue} by {environment}.as_count() + sum:counter.success{mytag:mytagvalue} by {environment}.as_count() ) * 100 > ${var.my-monitor-critical}"
It would be great to support type = "log alert"
in datadog_monitor
.
0.11.5
resource "datadog_downtime" "DOWNTIME_CPU_myhost" {
scope = ["name:myhost"]
start = 1525840200 # 4:30 AM UTC
end = 1525843800 # 5:30 AM UTC
monitor_id = XXXXXXX # MONITOR_HOSTGROUP_CPU
recurrence {
type = "days"
period = 1
}
}
Configuring a datadog_downtime resource with arguments for start and end time as well as recurrence should transparently be handled when applying any configuration changes on future dates. Even though the POSIX timestamps reference a start and end time on a specific date, in Datadog the times get automatically updated based on the recurrence schedule.
Is there a different combination of arguments or resources to use to accomplish scheduling recurring monitor suppression for a subset of hosts under a Datadog monitor?
The initial creation of the datadog_downtime resource works as expected. When applying any unrelated resource change that includes this configuration on subsequent days, Datadog returns an error because the original timestamp values in the configuration for this resource are now in the past:
error updating downtime: API error 400 Bad Request: {"errors":["Scheduled downtime start cannot be in the past"]}
Please list the steps required to reproduce the issue, for example:
This issue was originally opened by @kmshultz as hashicorp/terraform#13784. It was migrated here as part of the provider split. The original body of the issue is below.
$ terraform version
Terraform v0.9.2
resource "datadog_monitor" "outliermonitor" {
name = "test outlier monitor"
type = "metric alert"
query = "avg(last_5m):outliers(avg:chef.resources.updated{env:prod} by {host}, 'DBSCAN', 3.0) > 0"
message = "chef resources outlier alert"
}
https://gist.github.com/kmshultz/23bf92aea315a1596d21920092da1c11
Terraform ought not to update the datadog_monitor
on every apply
.
Terraform updated the datadog_monitor
because while it passed 'metric alert' to the Datadog API on creation, the API categorized it โ because of its more complex query
โ as 'query alert'.
This is a quirk on Datadog's side, not a bug in Terraform. When a client POSTs or PUTs a Datadog monitor with type 'metric alert', the Datadog API inspects the query and sometimes recategorizes it as 'query alert' (e.g. for APM monitors, Outlier monitors, monitors that do arithmetic on two or more metrics). However, clients are NOT advised to pass 'query alert' in POST/PUTs โ it's up to Datadog to decide when a 'metric alert' should be recategorized. 'query alert' is only important for the Datadog monitoring backend to know about, but the type is sort of exposed to clients right now, and this isn't likely to change any time soon.
So, it's hacky, but the datadog_monitor
resource should probably look for 'query alert' in its Read function and override it to 'metric alert'. Without this override, terraform apply
will always try to PUT a 'query alert' monitor back to 'metric alert', fail to do so (though Datadog will still return 2xx), and keep trying to set it to 'metric alert' every time.
See Debugging output gist above.
Hi there,
Our terraform configuration that contains bunch of AWS resources also includes datadog resources (monitors and dashboards). Terraform datadog provider works just fine, i.e. we can create, delete or delete datadog resources without any issues. At the same time in case Datadog service is down
, we're not able to apply any changes into our AWS resources which are defined in the same config as datadog resources. In this case terraform just fails on initialization of datadog provider.
This is very important for us to be able to deploy changes into our AWS resources even if Datadog is down or suffers from partial degradation.
We're using S3 terraform remote backend, so if I comment out all datadog resources and datadog provider in terraform, terraform init and plan/apply will be still failing trying to init datadog provider (it fails on missing DD keys).
What do you guys think about disable feature datadog provider? E.g. if DD provider is disabled, terraform will ignore all DD resources during plan/apply.
Looking forward to receiving reply from you.
Thank you!
Terraform v0.9.11
Please list the resources as a list, for example:
After a monitor was manually updated in the datadog web UI, its parameters should have been overwritten by any terraform apply.
It failed at the plan stage on a null reference.
Please list the steps required to reproduce the issue, for example:
* module.monitoring.datadog_monitor.frontend_error_rate_percent: 1 error(s) occurred:
* module.monitoring.datadog_monitor.frontend_error_rate_percent: datadog_monitor.frontend_error_rate_percent: strconv.ParseInt: parsing "null": invalid syntax
* module.monitoring.datadog_monitor.frontend_response_time_2xx_p99: 1 error(s) occurred:
* module.monitoring.datadog_monitor.frontend_response_time_2xx_p99: datadog_monitor.frontend_response_time_2xx_p99: strconv.ParseInt: parsing "null": invalid syntax
I would like to create a few modules so backend engineers can easily create dashboards for services they own. As discussed in #47, graphs are inline objects, so they don't support things like count
or conditionals
. One solution is to provide graphs as a "companion resource", similar to how aws_security_group
and aws_security_group_rule work together. Is that possible? Has it been explored before?
I don't know golang and never written a TF plugin, but I think I want this bad enough to learn if someone from the team can give some guidance on whether it is even possible here and what needs to change.
Terraform v0.10.8
Provider v1.0.3
Please list the resources as a list, for example:
My goal is to end up being able to do something like this:
module "myservice" {
source = "modules/service-timeboard"
name = "myservice"
environment = "${var.environment}"
include_system_metrics = true
include_ebs_metrics = true
egress_services = ["some_other_service1", "some_other_service_2"]
mongo_clusters = ["some_mongo_cluster_name"]
redis_clusters = []
rds_instances = ["${aws_db_instance.myservicerds.id}"]
}
To do that, the provider needs to support something like this
resource "datadog_timeboard" "this" {
title = "${title(var.name)} Timeboard [${upper(var.environment)}]"
description = "A timeboard generated automatically by Terraform for the ${title(var.name)} service"
read_only = true
}
resource "datadog_timeboard_graph" "cpu" {
count = "${var.include_system_metrics ? 1 : 0}"
timeboard = "${datadog_timeboard.this.id}"
title = "Hostmap (CPU)"
viz = "hostmap"
request {
q = "avg:system.load.norm.5{...stuff...} by {host}"
type = "fill"
}
scope = ["environment:${var.environment}", "service:${var.name}"]
group = ["availability-zone"]
style {
palette = "green_to_orange"
}
include_no_metric_hosts = true
}
resource "datadog_timeboard_graph" "other_stuff" {
timeboard = "${datadog_timeboard.this.id}"
...
}
This issue was originally opened by @ssonne as hashicorp/terraform#18081. It was migrated here as a result of the provider split. The original body of the issue is below.
$ terraform -v
Terraform v0.11.3
+ provider.datadog v1.0.3
+ provider.local v1.1.0
resource "datadog_downtime" "my_downtime" {
scope = ["*"]
start = 1526684400
end = 1526731200
recurrence {
type = "days"
period = 1
}
monitor_id = "${datadog_monitor.my_monitor.id}"
}
~ datadog_downtime.my_downtime
active: "true" => "false"
...
* datadog_downtime.my_downtime: 1 error(s) occurred:
* datadog_downtime.my_downtime: error updating downtime: API error 400 Bad Request: {"errors":["Scheduled downtime start cannot be in the past"]}
active
should not be considered changed - it changes daily from true to false as the downtime window passes, so it is getting updating all the time for no reason.start
and end
should be modified by stepping forward in time until they are in the future, or else there should be a way to write the time of day in relative terms. Configuring the window using POSIX timestamps makes it really hard to modify the downtime, because any change requires looking up the POSIX timestamp of the next start and end times of the downtime (to avoid the "Scheduled downtime start cannot be in the past" error). Combined with active
being considered a change, the same config will succeed during part of the day and fail during the downtime.Unnecessary change to the downtime's status, and an error.
start
and end
.terraform init
terraform apply
terraform apply
This issue was originally opened by @stephenchu as hashicorp/terraform#12468. It was migrated here as part of the provider split. The original body of the issue is below.
0.8.7
Please list the resources as a list, for example:
resource "datadog_monitor" "invalid_name_missing_a_closing_curly" {
name = "foo - {{#is_match "host.name" "bar"}}HERE{{/is_match}"
type = "service check"
query = "\"datadog.agent.up\".over(\"sometag:somevalue\").by(\"host\").last(5).count_by_status()"
message = "BODY"
}
On Datadog's UI, the error immediately results an error message like this:
Invalid monitor name: Error at character 149 of line 1 near {{/is_match}
During terraform apply
, it waited for about a minute (possibly a timeout value), then this unreadable error message:
Error applying plan:
1 error(s) occurred:
* datadog_monitor.datadog_agent_heartbeat: error updating monitor: Put https://app.datadoghq.com/api/v1/monitor/12345678?api_key=aaaaaaaaaaa&application_key=bbbbbbbbbbbb: http: ContentLength=1012 with Body length 0
Simply plan
and apply
with the syntax invalid monitor name
attribute like above.
Working on fully automating our monitoring setup.
It would be very helpful if Terraform could configure Datadog's Slack integration.
My plan is for monitoring to be setup with each service. That would include a Slack channel and that channel added to Datadog's Slack integration.
I'd expect 2 resources to be needed
Support in go-datadog: zorkian/go-datadog-api@ebe1e4a
Terraform v0.11.7
+ provider.aws v1.8.0
+ provider.datadog v1.0.3
Trying to take the JSON config from a dashboard I created in Datadog, and port it to HCL so that I can manage it with Terraform. It appears that the process visualization is not supported yet by this provider.
The HCL config for Datadog is:
resource "datadog_timeboard" "processes_nonprod" {
title = "Processes (Non-Prod)"
description = "A view of host processes."
read_only = true
graph {
title = "CPU (total) per process"
viz = "process"
autoscale = true
status = "done"
request {
type = "process"
metric = "process.stat.cpu.total_pct.norm"
text_filter = "sshd"
limit = 200
tag_filters = [
"$account",
"$cluster"
]
style {
palette = "classic"
type = "solid"
width = "normal"
}
notes = ""
}
}
template_variable {
name = "account"
prefix = "account"
default = "REDACTED"
}
template_variable {
name = "cluster"
prefix = "cluster"
default = "REDACTED"
}
}
Hi there,
When configuring anomaly monitors Datadog has an option threshold_windows
:
{
"name": "Transaction processing anomaly",
"type": "query alert",
"query": "avg(last_4h):anomalies(sum:transactions{*}.as_count(), 'robust', 2, direction='both', alert_window='last_30m', interval=60, count_default_zero='false', seasonality='weekly') > 0.2",
"message": "Anomaly detected",
"tags": [
"*",
"type:app"
],
"options": {
"notify_audit": false,
"locked": true,
"timeout_h": 0,
"silenced": {},
"include_tags": false,
"no_data_timeframe": null,
"new_host_delay": 300,
"require_full_window": false,
"notify_no_data": false,
"renotify_interval": 0,
"escalation_message": "",
"threshold_windows": {
"recovery_window": "last_5m",
"trigger_window": "last_30m"
},
"thresholds": {
"critical": 0.2
}
}
}
But there is no option to configure that by Terraform.
Terraform v0.11.7
+ provider.datadog v1.0.3
Terraform v0.10.5
resource "datadog_timeboard" "my_timeboard" {
title = "My timeboard"
description = "My timeboard description"
graph {
title = "My Graph Title"
viz = "timeseries"
request {
q = "sum:aws.elb.request_count{*}.as_count()"
aggregator = "avg"
type = "line"
}
}
}
Please list the steps required to reproduce the issue, for example:
terraform apply
with the above configterraform plan
Step 3 (terraform plan) should show that the timeboard needs to be updated (to remove the extra unexpected request on the graph.
Terraform plan output shows no changes expected. Terraform apply does not apply any changes.
I've noticed that terraform WILL correctly detect certain changes (such as title or description updates) but will not detect additional graphs. I'm not sure if there are OTHER things that aren't causing plan to report updates.
This issue was originally opened by @keymon as hashicorp/terraform#11106. It was migrated here as part of the provider split. The original body of the issue is below.
If the number defined in the query
of a monitor does not match the critical threshold, Datadog API refuses to change the monitor. But the datadog terraform provider does not handle that situation and fails in a obscure way, not telling the user what the problem is.
Additionally, it leaks the datadog credentials, which I will report in a different issue.
0.7.10
atadog_monitor.cc_job_queue_length: Modifying...
thresholds.critical: "10.0" => "25"
thresholds.warning: "7.0" => "20"
datadog_monitor.cc_job_queue_length: Still modifying... (10s elapsed)
datadog_monitor.cc_job_queue_length: Still modifying... (20s elapsed)
datadog_monitor.cc_job_queue_length: Still modifying... (30s elapsed)
datadog_monitor.cc_job_queue_length: Still modifying... (40s elapsed)
datadog_monitor.cc_job_queue_length: Still modifying... (50s elapsed)
datadog_monitor.cc_job_queue_length: Still modifying... (1m0s elapsed)
Error applying plan:
1 error(s) occurred:
* datadog_monitor.cc_job_queue_length: error updating monitor: Put https://app.datadoghq.com/api/v1/monitor/1402762?api_key=12345678901234567890234567890&application_key=12345678901234567890234567890: http: ContentLength=489 with Body length 0
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
Terraform should report the mismatch properly so the user can update the changes.
It retried and gave a obscure error (see above)
It also prints the credentials of datadog ๐ฑ
Create a monitor with a query limit different than the critical threshold, like here: https://github.com/alphagov/paas-cf/blob/8b467afbc785c337e96f355253229b34ee9d321f/terraform/datadog/cloud_controller.tf#L110
Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?
Datadog recently announced a new feature, recovery thresholds.
Please support these. I believe it should be as simple as adding critical_recovery
and warning_recovery
to the thresholds
block, and passing those values through to the API.
Hi,
I'd really like to be able to terraform a screenboard with my terraformed monitors in. I feel that's the missing part of the puzzle.
https://godoc.org/gopkg.in/zorkian/go-datadog-api.v2#Screenboard
This issue was originally opened by @epinault as hashicorp/terraform#18203. It was migrated here as a result of the provider split. The original body of the issue is below.
Hi
I am looking at the datadog provider and I am trying to add monitors.. Do you support anomaly type monitor? How do I go about supporting it otherwise?
Thanks
Hi there,
The Datadog API supports configuration of the AWS integration; it would be nice to have the ability to configure it here.
At least according to the current documentation, the datadog_timeboard
resource has no attributes. It would be really useful if the "id" was exposed as an attribute, as this can be used to determine the timeboard URL (https://app.datadoghq.com/dash/<ID>
).
Terraform v0.10.7
datadog "silenced" map shows
resource "datadog_monitor" "prod_mem_high" {
name = "Memory usage is high"
type = "metric alert"
message = "@pagerduty-SRE"
escalation_message = ""
query = "avg(last_5m):avg:docker.mem.in_use{ecs_cluster:prod,!empire.app.name:graphql,!empire.app.name:mobile,!empire.app.name:graphql-beta} by {container_name} > ${var.prod_mem_high}"
thresholds {
critical = "${var.prod_mem_high}"
}
notify_no_data = "${var.notify_no_data}"
renotify_interval = "${var.renotify_interval}"
new_host_delay = "${var.new_host_delay}"
require_full_window = "${var.require_full_window}"
notify_audit = "${var.notify_audit}"
timeout_h = "${var.timeout_h}"
locked = "${var.locked}"
no_data_timeframe = 2
silenced {
"*" = 1
}
tags = ["*"]
}
After a terraform apply, there should be no changes in terraform plan.
After a terraform apply, a terraform plan thinks there are changes to be applied:
~ datadog_monitor.prod_mem_high
silenced.%: "0" => "1"
silenced.*: "" => "1"
terraform apply
terraform plan
I see that this issue is closed:
https://github.com/terraform-providers/terraform-provider-datadog/issues/53
Which may be related to this.
Requesting support for managing Dashboard Lists.
Requires zorkian/go-datadog-api#155
"role" cannot be changed once a user is in the system. Once it's created and role has changed in the meantime, terraform plan
will always suggest changing the role. However, terraform apply
won't change anything, and subsequent terraform plan
will want to change the role again. If it's not possible to change one's role via API, it's better to convert this field as computed so it doesn't complicate things.
~ datadog_user.datadog_user_rush
role: "" => "Lead Channel Engineer"
v0.9.9
Please list the resources as a list, for example:
If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.
The state file should have been modified with the imported requests_not_being_served
monitor.
datadog_monitor.requests_not_being_served: Importing from ID "2395212"...
Error importing: 1 error(s) occurred:
* datadog_monitor.requests_not_being_served (import id: 2395212): import datadog_monitor.requests_not_being_served (id: 2395212): json: cannot unmarshal string into Go struct field Options.evaluation_delay of type int
Please list the steps required to reproduce the issue, for example:
terraform import datadog_monitor.example <ID of monitor>
Run terraform -v
to show the version. If you are not running the latest version of Terraform, please upgrade because your issue may have already been fixed.
Terraform v0.11.3
+ provider.datadog v1.0.3
Please list the resources as a list, for example:
resource "datadog_timeboard" "testboard" {
title = "testboard"
description = "testboard"
read_only = true
graph {
title = "Success Percentage"
viz = "query_value"
precision = 3
request {
q = "sum:int.request.successful{*}.as_count()/sum:int.request.count{*}.as_count()"
aggregator = "min"
}
}
}
https://gist.github.com/dustinblackman/639422d1baec7a8d5cdcf8c0ab4b6369
Requests aggregator
should be set to min
.
aggregator
is completely ignored and the default of avg
is set on the datadog dashboard.
terraform apply
Running terraform plan or apply over existent terraform datadog timeboards resulted in a json: cannot unmarshal string into Go struct field Yaxis.min of type float64
error
Note that despite the error is happening with terraform v0.10.3, I was able to push changes to datadog with this version of terraform until 31/08. But since early September, it has started to fail with that error.
Deleting existent terraform.state and re running plan or apply using the same timeboard configuration works. But doing this will duplicate information in ddog as it will recreate all resources.
The following is how the graph looks atm:
title = "graph title"
viz = "timeseries"
autoscale = true
precision = "3"
request {
q = "the-query"
type = "area"
style {
palette = "grey"
}
aggregator = "sum"
}
marker {
type = "error dashed"
value = "y = 99.95"
label = "99.95"
}
yaxis {
min = "99"
max = "100"
}
}
And in terraform.state it is shown as
"graph.1.yaxis.%": "2",
"graph.1.yaxis.max": "100",
"graph.1.yaxis.min": "99",
v0.10.3
datadog_timeboard
The state file should have been modified with the new added configuration in the datadog timeboard
Terraform plan or apply is failing. The error reads json: cannot unmarshal string into Go struct field Yaxis.min of type float64
According to the docs for the Datadog API downtime, you can provide a monitor_id in order to silence only a specific monitor, or monitor+scope. This would be extremely helpful in silencing monitors that are like little yippy dogs overnight without having to silence any other monitors for a specific host, instance, tag, container, etc.
This issue was originally opened by @s4mur4i as hashicorp/terraform#16337. It was migrated here as a result of the provider split. The original body of the issue is below.
Hi there,
Datadog Read-only user is not created, but a standard user is created
Terraform version:
~/Downloads/terraform -v
Terraform v0.10.7
resource "datadog_user" "test" {
email = "[email protected]"
handle = "[email protected]"
name = "Test User"
is_admin = false
role = "ro"
}
Since we have a higher amount of other resources, only part is being inserted:
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalCompareDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalGetProvider
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadState
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApplyPre
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApply
2017/10/13 11:17:52 [DEBUG] apply: datadog_user.test: executing Apply
datadog_user.test: Creating...
disabled: "" => "false"
email: "" => "[email protected]"
handle: "" => "[email protected]"
is_admin: "" => "false"
name: "" => "Test User"
role: "" => "ro"
verified: "" => "<computed>"
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalCompareDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalGetProvider
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalCompareDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalGetProvider
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadState
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalCompareDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalGetProvider
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadState
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApplyPre
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadState
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApplyPre
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApply
After running again I see following:
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
datadog_user.test: Modifying... (ID: [email protected])
role: "" => "ro"
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalCompareDiff
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalGetProvider
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalReadState
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApplyPre
2017/10/13 11:17:52 [TRACE] root: eval: *terraform.EvalApply
On Datadog dashboard I see user as a standard user, not a read-only user.
If I manually edit the user to read-only after next terraform run it will be converted to standard user
Read Only user is created
Standard user is created, and modifications are not changing to read-only user
Please list the full steps required to reproduce the issue, for example:
terraform init
terraform apply
This issue was originally opened by @fvillain as hashicorp/terraform#14657. It was migrated here as part of the provider split. The original body of the issue is below.
Hello
It seems that there is an issue with the silenced
part of a datadog_monitor
ressource
At least the following versions :
Given the following module :
variable "deploy" { default = 0 }
variable "active" { default = 0 }
resource "datadog_monitor" "asg_in_service_instances" {
count = "${var.deploy ? 1 : 0}"
name = "asg-no-in-service-host"
type = "metric alert"
query = "avg(last_5m):sum:aws.autoscaling.group_in_service_instances < 1"
thresholds {
critical = 1
}
renotify_interval = 90
require_full_window = false
notify_no_data = false
message = "some message"
escalation_message = "some escalation"
silenced {
"*" = "${var.active}" # or ${var.active ? 1 : 0} - it's the same behaviour
}
}
And the given terraform file:
module "monitoring" {
source = "/modules/monitoring"
deploy = "${terraform.env == "dev" ? 0 : 1}"
active = "${data.consul_keys.consul.var.live} == "blue" ? 1 : 0}"
}
NB this is a simplified version of course
This is the full output of my actual modules
https://gist.github.com/fvillain/ea17cbcf10e1ec00ea6a259cf1dcd905
Well... no error. It should tell me it will create the monitor defined in my module
The plan step output this error :
* module.monitoring.asg_in_service_instances: silenced (*): cannot parse '' as int: strconv.ParseInt: parsing "${var.active ? 1 : 0 }": invalid syntax
terraform env new staging
terraform get
terraform plan
datadog_monitor
If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.
resource "datadog_monitor" "foo" {
name = "..."
type = "metric alert"
...
# The significant bit.
tags = [ "foo:bar"]
}
Then remove the tags
block:
resource "datadog_monitor" "foo" {
name = "..."
type = "metric alert"
...
}
Terraform says it is removing the tag, but it doesn't happen:
datadog_monitor.foo: Modifying... (ID: 3734291)
tags.#: "1" => "0"
tags.0: "foo:bar" => ""
datadog_monitor.foo: Modifications complete after 1s (ID: 3734291)
Re-run terraform plan
and it still detects the tag:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
~ datadog_monitor.script_runtime
tags.#: "1" => "0"
tags.0: "foo:bar" => ""
Plan: 0 to add, 1 to change, 0 to destroy.
Subsequent terraform plan invocations show the resources as out-of-sync, and subsequent terraform apply invocations end in the same result.
Please list the steps required to reproduce the issue, for example:
This issue was originally opened by @munyirik as hashicorp/terraform#17893. It was migrated here as a result of the provider split. The original body of the issue is below.
I see that there's a timeboard resource available -> https://www.terraform.io/docs/providers/datadog/r/timeboard.html
Is there a "datadog_screenboard" resource that can create/manage screenboards? If not, are there plans to make this available.
I fell that the Datadog provider is not doing proper sanity checks on plan.
In my opinion, during plan i should get an error if the resource is misconfigured.
Currently, i'm getting failures on apply due stupid mistakes i make, i think they could be checked during plan.
Examples:
A few example of my failures:
plan
+ datadog_monitor.xxxxxxxxxxxxxxxxxxx
locked: "true"
message: "xxxxxxxxxxxxxxxxxxx"
name: "xxxxxxxxxxxxxxxxxxxxxx"
no_data_timeframe: "10"
notify_audit: "false"
notify_no_data: "true"
query: "sum(last_5m):avg:production.xxxxx.xxxxxx.count{target:xxxx-xxx-xxxxxx-xxxx}.as_count()"
tags.%: "1"
tags.terraform: "true"
thresholds.%: "1"
thresholds.critical: "1"
timeout_h: "0"
type: "metric alert"
Plan: 1 to add, 0 to change, 0 to destroy.
apply
Error applying plan:
1 error(s) occurred:
* datadog_monitor.xxxxxxxxxxxxxxxxxxx: error updating montor: API error 400 Bad Request: {"errors":["The value provided for parameter 'query' is invalid"]}
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
plan
+ datadog_monitor.foo
locked: "true"
message: "xxxxxxxxxxxxxxxxxxx"
name: "xxxxxxxxxxxxxxxxxxxxxx"
no_data_timeframe: "10"
notify_audit: "false"
notify_no_data: "true"
query: "avg(last_5m):avg:production.xxxxx.xxxxxx.count{*}.as_count() <= 1"
tags.%: "1"
tags.terraform: "true"
thresholds.%: "1"
thresholds.critical: "1"
timeout_h: "0"
type: "metric alert"
Plan: 1 to add, 0 to change, 0 to destroy.
apply
Error applying plan:
1 error(s) occurred:
* datadog_monitor.foo: error updating montor: API error 400 Bad Request: {"errors":[".as_count() monitors must use the 'sum'/'in total' time aggregator"]}
Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.
I'd love to get this failure on plan. Don't really have much experience with writing providers, Do you think it can be done?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.