robustperception / azure_metrics_exporter Goto Github PK

View Code? Open in Web Editor NEW

133.0 133.0 69.0 4.01 MB

Azure metrics exporter for Prometheus

License: Apache License 2.0

Go 95.54% Makefile 3.73% Dockerfile 0.73%

azure_metrics_exporter's People

Contributors

Stargazers

Watchers

Forkers

carlozleite asheniam ngrewe beckness murata100 lulzzz alexsem1989 yves-vogl geekdave formorer carlpett credativ criteo-forks hristodragolovbede pslijkhuis fxinnovation mboret louisfelix ganpatagarwal rolandvarga peterjmills1 shoudusse dustinmoorman itavy kenny-nextlink mcbenjemaa dnull88 adapture ptv-logistics conplementag johnny-platform luisfdez rturowicz forestsword sheffercool robertoporfiro clehnen zurlys neerajmangal zwb-github relayr topine clix-dev-llc marvel-works adivinho percona ludydoo nickbp aayushtrp prasadrajesh fhambrec vivekdevops inveox-lab-it vatcher minmin9324 anugajeeban ministryofjustice cognitedata kenwoodjw shrikant-n wangchunyong168 jasond2014 umatare5 dhomane namm2

azure_metrics_exporter's Issues

Error "Unable to query metrics API"

I have this configuration:

azure.yml

credentials:
  subscription_id: <secret>
  tenant_id: <secret>
  client_id: <secret>
  client_secret: <secret>

targets:
  - resource: "/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<machine_name>"
    metrics:
    - name: "Percentage CPU"
    - name: "Network In"
    - name: "Network Out"
"/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<machine_name>"
    metrics:
    - name: "Http2xx"
    - name: "Http5xx"

When i attempt to get metrics i got this output:

$ curl IP_ADDRESS:9276/metrics


# HELP network_in_bytes_average network_in_bytes_average
# TYPE network_in_bytes_average gauge
network_in_bytes_average{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_in_bytes_max network_in_bytes_max
# TYPE network_in_bytes_max gauge
network_in_bytes_max{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_in_bytes_min network_in_bytes_min
# TYPE network_in_bytes_min gauge
network_in_bytes_min{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_in_bytes_total network_in_bytes_total
# TYPE network_in_bytes_total gauge
network_in_bytes_total{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_out_bytes_average network_out_bytes_average
# TYPE network_out_bytes_average gauge
network_out_bytes_average{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_out_bytes_max network_out_bytes_max
# TYPE network_out_bytes_max gauge
network_out_bytes_max{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_out_bytes_min network_out_bytes_min
# TYPE network_out_bytes_min gauge
network_out_bytes_min{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP network_out_bytes_total network_out_bytes_total
# TYPE network_out_bytes_total gauge
network_out_bytes_total{resource_group="<resource_group>",resource_name="<machine_name>"} 0
# HELP percentage_cpu_percent_average percentage_cpu_percent_average
# TYPE percentage_cpu_percent_average gauge
percentage_cpu_percent_average{resource_group="<resource_group>",resource_name="<machine_name>"} 6.2375
# HELP percentage_cpu_percent_max percentage_cpu_percent_max
# TYPE percentage_cpu_percent_max gauge
percentage_cpu_percent_max{resource_group="<resource_group>",resource_name="<machine_name>"} 6.49
# HELP percentage_cpu_percent_min percentage_cpu_percent_min
# TYPE percentage_cpu_percent_min gauge
percentage_cpu_percent_min{resource_group="<resource_group>",resource_name="<machine_name>"} 6.09
# HELP percentage_cpu_percent_total percentage_cpu_percent_total
# TYPE percentage_cpu_percent_total gauge
percentage_cpu_percent_total{resource_group="<resource_group>",resource_name="<machine_name>"} 24.95

And i got this error from azure-metrics-exporter:

azure_metrics_exporter     --config.file /opt/azure-exporter/azure.yml --web.listen-address=":9276"
2019/04/05 13:45:53 azure_metrics_exporter listening on port :9276
2019/04/05 13:46:02 Failed to get metrics for target /resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/<machine_name>: Unable to query metrics API with status code: 400

There is something wrong with my config file?

No ResourceTag Validation

I've noticed while looking at the config code that there's no counterpart to the ResourceGroup validation. Is there a reason for this or would a PR to do so be helpful?

azure_metrics_exporter/config/config.go

Lines 80 to 96 in a13576c

    
           for _, t := range c.ResourceGroups { 
        
           	if err := c.validateAggregations(t.Aggregations); err != nil { 
        
           		return err 
        
           	} 
        
           	if len(t.ResourceGroup) == 0 { 
        
           		return fmt.Errorf("resource_group needs to be specified in each resource group") 
        
           	} 
        
           	if len(t.ResourceTypes) == 0 { 
        
           		return fmt.Errorf("At lease one resource type needs to be specified in each resource group") 
        
           	} 
        
           	if len(t.Metrics) == 0 { 
        
           		return fmt.Errorf("At least one metric needs to be specified in each resource group") 
        
           	} 
        
           }

How to collect memory related metrics?

Hi,

I would like to collect the memory related metrics (current usage or available memory bytes) whatever available for VMs. Is it possible to do it?

Thanks,

Publish a docker container?

Would it be possible to publish this project as a docker container on a public repo? I understand it's still under development, but it would make it easier for us to kick the tires!

Thanks!

Can it add "resource type" as one of the metrics label

I am wondering if it is possible to add "resource type" in the exposed metrics labels. for example what I have got is as follows. it shows the "resource_group" and "resource_name" but no "resource type".
Resource type might be a useful information when the user want to filter interesting metrics.

# HELP bitsinpersecond_countpersecond_average bitsinpersecond_countpersecond_average
# TYPE bitsinpersecond_countpersecond_average gauge
bitsinpersecond_countpersecond_average{resource_group="bca-dev-westeurope",resource_name="azure-to-c3-mash-up-connection"} 0
bitsinpersecond_countpersecond_average{resource_group="bca-dev-westeurope",resource_name="azure-to-ims-connection"} 0

one of the resource related to above metrics in the azure is like

{
    "id": "/subscriptions/15b4d43c-7a12-42ea-8184-cedd7e6f229a/resourceGroups/bca-dev-westeurope/providers/Microsoft.Network/connections/azure-to-ims-connection",
    "identity": null,
    "kind": null,
    "location": "westeurope",
    "managedBy": null,
    "name": "azure-to-ims-connection",
    "plan": null,
    "properties": null,
    "resourceGroup": "bca-dev-westeurope",
    "sku": null,
    "tags": null,
    "type": "Microsoft.Network/connections"
  },

it would be nice to get metric like follows

# HELP bitsinpersecond_countpersecond_average bitsinpersecond_countpersecond_average
# TYPE bitsinpersecond_countpersecond_average gauge
bitsinpersecond_countpersecond_average{resource_group="bca-dev-westeurope", resource_type="connections", resource_name="azure-to-c3-mash-up-connection"} 0
bitsinpersecond_countpersecond_average{resource_group="bca-dev-westeurope", resource_type="connections",  resource_name="azure-to-ims-connection"} 0

Feature discussion: Allow Azure config from environment

Currently the client id/secret needs to be put in the configuration file. It would be nice if it was allowed to read this from the environment instead.
If this would be acceptable, I can send in a PR.

Default port discussion : should it be changed ?

Hi!

The exporter default port is currently 9276.
Unfortunately, this port is already taken by another exporter on the Default port allocations wiki, and azure_metrics_exporter is not listed there.

Although it would be a breaking change, maybe it would be better to follow the wiki procedure (sooner than later). I propose to take the next available port on the wiki, and then change the default port in the exporter accordingly.

I could take care of that. What do you think @brian-brazil ?
Thanks!

Missing API version on virtualMachineScaleSets

I'm currently facing an issue where the azure-exporter is unable to find the API version for Virtual Machine Scalesets. Perhaps there should be a way to manually specify the API version to use?

Error message:

azure-exporter | 2021/06/03 21:22:32 Failed to get resource info: No api version found for type: RESOURCE_GROUP/providers/virtualMachineScaleSets

My config (minus credentials section) is:

targets:
  - resource: "/resourceGroups/RESOURCE_GROUP/providers/Microsoft.Compute/virtualMachineScaleSets/RESOURCE_NAME"
    metric_namespace: "Microsoft.Compute/virtualMachineScaleSets"
    metrics:
      - name: "Percentage CPU"
      - name: "Network In Total"
      - name: "Network Out Total"

I've also tried this without specifying the metric_namespace, and the same error is returned.

.\azure_metrics_exporter.exe --list.definitions successfuly returns:

Available Metrics:
2021/06/03 17:21:02 - Percentage CPU
2021/06/03 17:21:02 - Network In
2021/06/03 17:21:02 - Network Out
2021/06/03 17:21:02 - Disk Read Bytes
2021/06/03 17:21:02 - Disk Write Bytes
2021/06/03 17:21:02 - Disk Read Operations/Sec
2021/06/03 17:21:02 - Disk Write Operations/Sec
2021/06/03 17:21:02 - CPU Credits Remaining
2021/06/03 17:21:02 - CPU Credits Consumed
2021/06/03 17:21:02 - Data Disk Read Bytes/sec
2021/06/03 17:21:02 - Data Disk Write Bytes/sec
2021/06/03 17:21:02 - Data Disk Read Operations/Sec
2021/06/03 17:21:02 - Data Disk Write Operations/Sec
2021/06/03 17:21:02 - Data Disk Queue Depth
2021/06/03 17:21:02 - Data Disk Bandwidth Consumed Percentage
2021/06/03 17:21:02 - Data Disk IOPS Consumed Percentage
2021/06/03 17:21:02 - Data Disk Target Bandwidth
2021/06/03 17:21:02 - Data Disk Target IOPS
2021/06/03 17:21:02 - Data Disk Max Burst Bandwidth
2021/06/03 17:21:02 - Data Disk Max Burst IOPS
2021/06/03 17:21:02 - Data Disk Used Burst BPS Credits Percentage
2021/06/03 17:21:02 - Data Disk Used Burst IO Credits Percentage
2021/06/03 17:21:02 - OS Disk Read Bytes/sec
2021/06/03 17:21:02 - OS Disk Write Bytes/sec
2021/06/03 17:21:02 - OS Disk Read Operations/Sec
2021/06/03 17:21:02 - OS Disk Write Operations/Sec
2021/06/03 17:21:02 - OS Disk Queue Depth
2021/06/03 17:21:02 - OS Disk Bandwidth Consumed Percentage
2021/06/03 17:21:02 - OS Disk IOPS Consumed Percentage
2021/06/03 17:21:02 - OS Disk Target Bandwidth
2021/06/03 17:21:02 - OS Disk Target IOPS
2021/06/03 17:21:02 - OS Disk Max Burst Bandwidth
2021/06/03 17:21:02 - OS Disk Max Burst IOPS
2021/06/03 17:21:02 - OS Disk Used Burst BPS Credits Percentage
2021/06/03 17:21:02 - OS Disk Used Burst IO Credits Percentage
2021/06/03 17:21:02 - Inbound Flows
2021/06/03 17:21:02 - Outbound Flows
2021/06/03 17:21:02 - Inbound Flows Maximum Creation Rate
2021/06/03 17:21:02 - Outbound Flows Maximum Creation Rate
2021/06/03 17:21:02 - Premium Data Disk Cache Read Hit
2021/06/03 17:21:02 - Premium Data Disk Cache Read Miss
2021/06/03 17:21:02 - Premium OS Disk Cache Read Hit
2021/06/03 17:21:02 - Premium OS Disk Cache Read Miss
2021/06/03 17:21:02 - VM Cached Bandwidth Consumed Percentage
2021/06/03 17:21:02 - VM Cached IOPS Consumed Percentage
2021/06/03 17:21:02 - VM Uncached Bandwidth Consumed Percentage
2021/06/03 17:21:02 - VM Uncached IOPS Consumed Percentage
2021/06/03 17:21:02 - Network In Total
2021/06/03 17:21:02 - Network Out Total

.\azure-metrics-exporter.exe --list.namespaces successfully returns:

2021/06/03 17:29:24 Resource: /resourceGroups/RESOURCE_GROUP/providers/Microsoft.Compute/virtualMachineScaleSets/RESOURCE_NAME

Available namespaces:
2021/06/03 17:29:24 - Microsoft.Compute/virtualMachineScaleSets

Add Custom Prefix to Metric Names

In an environment with thousands of metrics names and a variety of exporters and targets it is tough to find out what each metric represents without looking at the job label. So I am using a system of prefixes to identify the metric names easily.

Is there any way to add a custom prefix to the metric names collected by azure_metrics_exporter?

Feature : implement --list.definitions with resource group config

Hi!

When calling ./azure_metrics_exporter --list.definitions with a resource_group configuration, no output appears. This is "normal" as I see the feature is not implemented for resource group.

I propose to implement it. It will need a small refactor to the Collect function; i.e., extract as a function the logic for include/exclude of the resource groups, in order to avoid code duplication.

I can start to work on that PR next week.

Thanks!

License?

What is the license on this code?

Set metric namespace?

Right now the metric names are a bit all over the place, since we just take what we get from Azure and use that as the name.
Could we set the namespace to azure? This would be a breaking change of course. As a motivation, the current approach seems to hurt discoverability of the metrics in my environment at least, since the majority of exporters/applications do some prefixing out their names. People could first query for {job="azure-metrics"} and see what is available, but from what I've seen this doesn't happen.

SQL Metrics not pulling in for server with multiple databases

I can see a previous issue was raised for this under #40
and is marked as fixed, however, I appear to be getting the same issue.

I have the following config for example:

resource_groups:
  - resource_group: "test-rg"
    resource_types:
    - "Microsoft.Sql/servers/databases"
    metrics:
    - name: "dtu_used"

The above will only return scrapes for a single db and not all databases, also in the logs I am getting a 400 error for the all other databases on the same resource group

Cut down on fatal errors

In azure.go, there are currently 15 uses of log.Fatalf. Which of them are reasonable to keep as fatal? For example, partial configuration errors (such as pointing to non-existing resources, which gives a 404) will right now quit the exporter.
Some errors might make sense to quit with (invalid credentials, perhaps), but most not. Happy to do the changes when there is agreement on what is best.

Ping @brian-brazil

ExpiredAuthenticationToken error status code changed from 200 to 401

Status code changed from 200 to 401, when ExpiredAuthenticationToken error.
(By changing apiVersion to 2018-01-01)

2018/06/12 05:39:47 azure_metrics_exporter listening on port :9276
2018/06/12 05:39:57 GET https://management.azure.com/...
...
2018/06/12 06:45:00 GET https://management.azure.com/...
2018/06/12 06:45:00 Unable to query metrics API with status code: 401

gets down in one hour from the start.

I should be able to assign metrics per resource type when using resource groups

At present this module doesn't appear to support exporting different metrics for resource groups that have multiple resource Ids, for example, we should be able to to the following:

resource_groups:
  - resource_group: "test-rg"
    resource_types:
    - "Microsoft.Sql/servers/databases"
        metrics:
        - name: "storage_percent"
        - name: "cpu_percent"
        - name: "allocated_data_storage"
        - name: "workers_percent"
        - name: "physical_data_read_percent"
        - name: "log_write_percent"
        - name: "sessions_percent"
        - name: "xtp_storage_percent"
        - name: "storage"
        - name: "connection_successful"
        - name: "connection_failed"
        - name: "blocked_by_firewall"
        - name: "deadlock"
     - "Microsoft.Sql/servers/elasticPools"
        metrics
        -name "cpu_percent"

At the moment the config expects the following:

resource_groups:
  - resource_group: "test-rg"
    resource_types:
    - "Microsoft.Sql/servers/databases"
    - "Microsoft.Sql/servers/elasticPools"
    metrics:
     - name: "storage_percent"
     - name: "cpu_percent"
     - name: "allocated_data_storage"
     - name: "workers_percent"
     - name: "physical_data_read_percent"
     - name: "log_write_percent"
     - name: "sessions_percent"
     - name: "xtp_storage_percent"
     - name: "storage"
     - name: "connection_successful"
     - name: "connection_failed"
     - name: "blocked_by_firewall"
     - name: "deadlock"

resource types have different metrics, and this causes 400's

Azure AppGateway - index out of range on MatchedCount

Hi Guys

I'm having an issue when attempting to pull Metrics of an Azure Appgateway V2.

Specifically when pulling MatchedCount I get the following error:

panic: runtime error: index out of range

goroutine 31 [running]:
main.(*Collector).extractMetrics(0xd11b48, 0xc0000ff800, 0xc000280000, 0x2ba, 0xc000110640, >0x12e, 0xc0000449c0, 0x4, 0x4, 0xc8, ...)
C:/GoProjects/src/github.com/RobustPerception/azure_metrics_exporter/main.go:71 +0xd2d
main.(*Collector).batchCollectResources(0xd11b48, 0xc0000ff800, 0xc000044a40, 0x1, 0x1)
C:/GoProjects/src/github.com/RobustPerception/azure_metrics_exporter/main.go:130 +0x3a6
main.(*Collector).Collect(0xd11b48, 0xc0000ff800)
C:/GoProjects/src/github.com/RobustPerception/azure_metrics_exporter/main.go:206 +0x848
github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prometheus/client_gola>ng/prometheus.(*Registry).Gather.func2(0xc0001849d0, 0xc0000ff800, 0x991220, 0xd11b48)
>C:/GoProjects/src/github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prome>theus/client_golang/prometheus/registry.go:383 +0x68
created by >github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prometheus/client_gola>ng/prometheus.(*Registry).Gather
>C:/GoProjects/src/github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prome>theus/client_golang/prometheus/registry.go:381 +0x2f0`

My config is as follows:

credentials:
subscription_id:
client_id:
client_secret:
tenant_id:

targets:

resource: "/resourceGroups/group/providers/Microsoft.Network/applicationGateways/gateway"
metrics:

name: "MatchedCount"`

Metric name conflict

Hi,

I have an issue with the Azure exporter metric name. My problem is, I want to get, for example, the "cpu_percent" metric value for a DB SQL but also for a DB PostgreSQL. The metric name is the same for both services and this seems the root cause of the issue.

My configuration:

resource_groups:
  - resource_group: "rg-test"
    resource_types:
     - Microsoft.Sql/servers/databases
    metrics:
    - name: "cpu_percent"
  - resource_group: "rg-test"
    resource_types:
     - Microsoft.DBforPostgreSQL/servers
     metrics:
     - name: "cpu_percent"

With this configuration, I receive the error:

collected metric cpu_percent_percent_max label:<name:"resource_group" value:"rg-test" > label:<name:"resource_name" value:"test-postgres" > gauge:<value:0 > was collected before with the same name and label values

But using my PR: #44 (which adds an additional label to avoid this error)

collected metric storage_percent_percent_max label:<name:"resource_group" value:"rg-test" > label:<name:"resource_name" value:"test-postgres" > gauge:<value:22.86 > has label dimensions inconsistent with previously collected metrics in the same metric family

I've also tried with this configuration(because I was not sure what was the correct syntax):

resource_groups:
  - resource_group: "rg-test"
    resource_types:
     - Microsoft.Sql/servers/databases
     - Microsoft.Sql/servers/databases
    metrics:
    - name: "cpu_percent"

But same issue.

I'm able to get the metric if in my configuration I put only "Microsoft.Sql/servers/databases" or "Microsoft.Sql/servers/databases". I mean, individually, it works but not when both are set together.

The cpu metric is:

# HELP cpu_percent_percent_total cpu_percent_percent_total
# TYPE cpu_percent_percent_total gauge
cpu_percent_percent_total{resource_group="rg-test",resource_name="test-postgres"} 0

Regarding the way to build the metric name, why not doing the same than for the AWS Cloudwatch exporter? Each metric has the service name.
Example: for the RDS database metrics: aws_rds_database_connections_sum

In the case of Azure metric exporter, the name of the DB CPU Percent metric is: cpu_percent_percent_total

Resource Types could have one of this format:

Microsoft.Web/sites
Microsoft.StreamAnalytics/streamingjobs
Microsoft.Sql/servers/databases
Microsoft.TimeSeriesInsights/environments/eventsources

Why not definiting the metric name like this(in my example, the cpu percent metric exists for each resource type):

azure_web_sites_cpu_percent_percent_total
azure_stream_analytics_streamingjobs_cpu_percent_percent_total
azure_sql_servers_databases_cpu_percent_percent_total
azure_timeseriesinsights_environments_eventsources_cpu_percent_percent_total

"Azure" at the beginning is not required but at least we keep the same naming logic with the AWS exporter and it's easier to identify the metric origin when your Prometheus is gathering metrics from multiple providers.

So a regex removing "Microsoft", a replace of "/" by "_" and lowercase the string should be enough IMO.

@brian-brazil What do you think ?

Thanks

Did not get status code 200

Hi,

I am trying to set up the exporter as a kubernetes application and using configmap for creating configuration file. However, when I am deploying the application I am getting an following error.

Failed to get token: Did not get status code 200, got: 404 with body:

Here is the configmap I created for the configuration.

apiVersion: v1
data:
  azure.yml: |
    active_directory_authority_url: "https://login.microsoftonline.com/"
    resource_manager_url: "https://management.azure.com/"
    credentials:
      client_id: "4xxxxxxx-4xxx-4xxx-8xxx-2xxxxxxxxxxx"
    resource_groups:
      - resource_group: "my_resource_group_name"
        resource_types:
          - "Microsoft.Compute/virtualMachines"
        metrics:
          - name: "Disk Read Bytes"
          - name: "Disk Write Bytes"
kind: ConfigMap
metadata:
  name: azure-exporter

Any suggestions? Do we have any sample configuration for k8s?

Exporter exiting when no data is returned from a metric definition available

Exporter exiting when no data exists for one metric.

panic: runtime error: index out of range

goroutine 16 [running]:
panic(0x3d3640, 0xc42000a0f0)
	/usr/local/Cellar/go/1.7.5/libexec/src/runtime/panic.go:500 +0x1a1
main.(*Collector).Collect(0x678d90, 0xc420327500)
	/usr/local/Cellar/go/1.7.5/bin/src/github.com/RobustPerception/azure_metrics_exporter/main.go:71 +0xd97
github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc420396ad0, 0xc420327500, 0x63b720, 0x678d90)
	/usr/local/Cellar/go/1.7.5/bin/src/github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prometheus/client_golang/prometheus/registry.go:383 +0x63
created by github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/usr/local/Cellar/go/1.7.5/bin/src/github.com/RobustPerception/azure_metrics_exporter/vendor/github.com/prometheus/client_golang/prometheus/registry.go:384 +0x326

Config:

  - resource: "/resourceGroups/xXXXX-rg/providers/Microsoft.ServiceBus/namespaces/XXXX"
    metrics:
      - name: "SuccessfulRequests"
      - name: "ServerErrors"
      - name: "UserErrors"
      - name: "IncomingRequests"
      - name: "IncomingMessages"
      - name: "OutgoingMessages"
      - name: "ActiveConnections"
      - name: "Size"
      - name: "Messages"
      - name: "ActiveMessages"
      - name: "DeadletteredMessages"
      - name: "ScheduledMessages"
      - name: "ThrottledRequests"

ThrottledRequests -> This is the guilty guy. If configured only on its own or removed from the list. there is no failure.

2018/11/12 12:55:14 Resource: XXXXX

Available Metrics:
2018/11/12 12:55:14 - SuccessfulRequests
2018/11/12 12:55:14 - ServerErrors
2018/11/12 12:55:14 - UserErrors
2018/11/12 12:55:14 - ThrottledRequests
2018/11/12 12:55:14 - IncomingRequests
2018/11/12 12:55:14 - IncomingMessages
2018/11/12 12:55:14 - OutgoingMessages
2018/11/12 12:55:14 - ActiveConnections
2018/11/12 12:55:14 - Size
2018/11/12 12:55:14 - Messages
2018/11/12 12:55:14 - ActiveMessages
2018/11/12 12:55:14 - DeadletteredMessages
2018/11/12 12:55:14 - ScheduledMessages```

GOV endpoints

It would be ideal if endpoints could be overloaded for alternate Azure environments like Gov. I believe just the TLD changes to .us (instead of .com), but need to validate that. I'll try to work on a PR, but wanted to mention her as well.

Not able to get databases metrics using resource_groups

I'm trying to get database metrics using resource group filtering which was added recently.
Targeting the database directly using the following resource:

/resourceGroups/SOMERG/providers/Microsoft.Sql/servers/SERVER/databases/DATABASE

works fine but when I try retrieving the metrics with resource group filtering it fails with:
collected metric storage_percent_percent_total label: label: gauge: was collected before with the same name and label values

I'm using the following configuration:

resource_groups:

resource_group: "SOMERG"
resource_types:
- "Microsoft.Sql/servers/databases"
  metrics:
- name: "storage_percent"

I wasn't sure what the resource_types should be and tried specifying different combinations but none was displaying anything except this one.

Strange gaps in some cosmos db metrics

I have a running exporter retrieving cosmos db metrics. It is running fine every 30 secs (no rate limit is hit). There are strange "gaps" (value 0) when trying to graph some of the metrics like Documents count and Storage/index size.

You can check here: https://imgur.com/49G9x7L

This is for documents count in a cosmosdb instance but the same can be observed for other storage related metrics. The same metrics in Azure portal have no gaps. The number requests metric though have no gaps. Not sure if this is related to the exporter itself or the Azure API. The strange thing is that it usually takes 4-5 minutes to expose again the correct size value. Didn't see any errors in the exporter logs.

The other thing that looks wrong is related to how many database(s) you have in a cosmos db instance and what the exporter retrieves/exposes. In my example I have 2 databases in a single cosmos db instance and the value (for storage size) that the exporter exposes is for one of the databases, not for all of them as I would expect. If there is no label showing what database the value of the metric is for (or any specific configuration targeting specific database in the config file) I think it should expose the total size of all databases not just one of them.

Feature request: Service bus queues/topics metrics and Cosmos DB database/collections metrics

I didn't find a way to retrieve metrics for specific queues/topics in ServiceBus instance and for specific database/collections in CosmosDB. Would be helpful to have this data exported as well.

Feature discussion: Azure Batch API

I noticed that the Azure Portal uses another API than the one used here for fetching metrics, which it might be interesting to use instead: https://management.azure.com/batch?api-version=2017-03-01

This apparently takes a list of urls and methods and returns a list of results. A single-request body looks like this:

{
    "requests": [
        {
            "relativeUrl": "/subscriptions/<subscription-id>/<resource-id>/providers/microsoft.Insights/metrics?...",
            "httpMethod": "GET"
        }
    ]
}

It would be interesting to see if this has some performance/quota benefits?

Feature proposition: adding all Azure resource tags as metrics labels

Hi!

In the cloud, ops team will often manage their resources by tag (providing information like "client", "business unit", "environment", etc...). Being able to filter metrics (in graph, silences, etc...) on these would be really useful. Having these tags automatically converted as metrics label would avoid to manually label metrics with this information (and thus, duplicate configuration).

I propose to add a feature to automatically convert resources tags as metrics labels. It would need to work with the 3 configuration mechanisms currently supported by the exporter (by target, by resource group, by tag).

I (or my colleague) could start to work on that next week!

怎么获取azure mysql的全部指标

azure.yml配置了前面访问权限的部分，后面target部分不明白如何编写
targets:

resource: "azure_resource_id"
metrics:
- name: "BytesReceived"
- name: "BytesSent"
resource: "azure_resource_id"
aggregations:
- Minimum
- Maximum
- Average
  metrics:
- name: "Http2xx"
- name: "Http5xx"
resource: "azure_resource_id"
metric_namespace: "Azure.VM.Windows.GuestMetrics"
metrics:
- name: 'Process\Thread Count'

resource_groups:

resource_group: "webapps"
resource_types:
- "Microsoft.Compute/virtualMachines"
  resource_name_include_re:
- "testvm.*"
  resource_name_exclude_re:
- "testvm12"
  metrics:
- name: "CPU Credits Consumed"

resource_tags:

resource_tag_name: "group"
resource_tag_value: "tomonitor"
resource_types:
- "Microsoft.Compute/virtualMachines"
  metrics:
- name: "CPU Credits Consumed"

Multidimensional metrics, multi subscriptions in a single config yml

As part of an ongoing effort to fetch Azure metrics for multiple providers (CosmosDB, VM scale sets, Web Sites etc), our team (walmart labs) has done a poc for this exporter.
As per the initial analysis and feedback from customers, following are the main concerns:

Multidimensional metrics are missing: For e.g. in case of CosmosDB, most of the available metrics have dimensions (AvailableStorage has e dimensions etc.). It seems there is no provision to fetch multi dimensional metrics as of this moment.
In case we need to fetch metrics from multiple subscriptions, multiple instances of exporter need to be deployed (one for each subscription). Is there any specific reason that we can configure only one subscription in azure.yml.
As per the current workflow all resource groups/resources need to be configured in yml file for which metrics need to be scraped (or tags). Since we have subscriptions which have more than 50 resource groups and multiple resources underneath, the azure yml file would be very tedious to write for all of the subscriptions.
As per the current design, the call to fetch metrics and then publish all metrics to prometheus seems to be serially executed. Would it be better to parallelize the fetch and publishing part, any thoughts on this ?

We are working on an enhancement for multidimensional metrics and making the config file as simple as possible to configure. For e.g. simply specify the list of subscriptions with name and type of providers for which metrics need to be fetched and we’ll dynamically fetch resource groups/resources, fetch the available dimensions for metrics, formulate URLs and fetch all metrics. I will keep on updating this thread as and when we make progress. Please have a look at above mentioned points and share your thoughts.

Docker description

which way Docker is called and a config file passed?

That's missing.

Azure managed identities support

Would be nice to have Azure managed identities support.
With this feature we will no longer have to create a service principal to connect to the Azure API.

"no token found" error because of the naming conventions

I am getting "no token found" error when I activate some metrics at azure.yml. It looks like this happens with metrics which name has a "/" like "Disk Read Operations/Sec" in the exporter config. Is this a bug in azure metrics exporter?

---

credentials:
  subscription_id: xxxx
  client_id: xxxx
  client_secret: xxxx
  tenant_id: xxxx

targets:
  - resource: "/resourceGroups/myvm/providers/Microsoft.Compute/virtualMachines/myvmm/"
    metrics:
    - name: "Percentage CPU"
    - name: "Network In"
    - name: "Network Out"
    - name: "Disk Read Bytes"
    - name: "Disk Write Bytes"
#    - name: "Disk Read Operations/Sec"  **This doesn't work**
#    - name: "Disk Write Operations/Sec"   **This doesn't work**
    - name: "CPU Credits Remaining"
    - name: "CPU Credits Consumed"
#    - name: "Per Disk Read Bytes/sec"   **This doesn't work**
#    - name: "Per Disk Write Bytes/sec"  **This doesn't work**
#    - name: "Per Disk Read Operations/Sec"  **This doesn't work**
#    - name: "Per Disk Write Operations/Sec"   **This doesn't work**
    - name: "Per Disk QD"
#    - name: "OS Per Disk Read Bytes/sec"   **This doesn't work**
#    - name: "OS Per Disk Write Bytes/sec"   **This doesn't work**
#    - name: "OS Per Disk Read Operations/Sec"   **This doesn't work**
#    - name: "OS Per Disk Write Operations/Sec"   **This doesn't work**
    - name: "OS Per Disk QD"

How do I install the plugin?

Can you please add more information at Readme file on how to install and configure this plugin?

Puppet module available?

i'm wondering whether a puppet module is already available but could not find anythign as of yet.
Will be working on one otherwise.

[Docker] "Unknown port" from second docker instance

Hello,

I am running two docker containers for dev and prod, from docker-compose:

version: "3.6"
services:
  azure-metrics-exporter_dev:
    image: robustperception/azure_metrics_exporter
    network_mode: host
    restart: always
    command:
    - '--config.file=/config/dev.yaml'
    volumes:
    - "/azure-metrics-exporter/config:/config:rw"
  azure-metrics-exporter_prod:
    image: robustperception/azure_metrics_exporter
    network_mode: host
    restart: always
    command:
    - '--config.file=/config/prod.yaml'
    - '--web.listen-address=":9275"'
volumes:
    - "/azure-metrics-exporter/config:/config:rw"

Metrics looks good from dev in localhost:9276/metrics , but second docker is restarting all the time. Docker logs output:

azure-metrics-exporter# docker logs b9d7asdecbd0
2021/04/01 10:31:25 azure_metrics_exporter listening on port ":9275"
2021/04/01 10:31:25 Error starting HTTP server: listen tcp: address tcp/9275": unknown port

It looks that is not able to read port. But in docker-compose.yaml I am using command:
- '--web.listen-address=":9275"'

Tested with separated config and SPNs for subscription

Expose remaining rate limit

The header x-ms-ratelimit-remaining-subscription-reads contains the remaining allowed read request quota, and would be use very useful. See https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits.

I'm happy to implement this, but would like some feedback on this approach:

In AzureClient.getMetricValue read and parse the header if present
Return the value as a third value (this is the major question mark, could also bundle it into AzureMetricValueResponse)
In Collector.Collect, check the min value returned from all get:s, and finally create a metric from this min.

Sounds reasonable?

There is a write quota header too, but it is only returned on writes, so to collect it we'd need to make some form of write and the service principal would need to be allowed to do so. I'd consider this out of scope initially. Main concern for us at least is reads.

Prometheus Exporters Hub by this repository! Thanks! :)

We made some open-source and service for about of prometheus exporter to find easy & use via your repository.
If you have and question or improvement request about this, feel free to make an issue ticket to here -> https://github.com/NexClipper/exporterhub.io

Thanks for contribution :)

https://exporterhub.io

Azure Resource ID from portal must be tweaked when adding it to azure_resource config key.

When picking up the azure resource id from the portal it adds extra information in the begnining, triggering a 404 when calling the api.

e.g:
/subscriptions/xxxx-xxxx-xxxx/resourceGroups/service-rg/providers/Microsoft.ServiceBus/namespaces/servicebus

when it is expected by the exporter like this:
/resourceGroups/service-rg/providers/Microsoft.ServiceBus/namespaces/servicebus

subscription/subscription-id bit is not needed, otherwise we get:

{"message":"No HTTP resource was found that matches the request URI 'https://management.azure.com/subscripti........

Suggestion, doc to be updated.

Microsoft azure Billing monitoring

Would it be possible to add Azure billing monitoring? Would be helpful , if we have billing data.

Microsoft have billing API's for access to the Azure Usage and Rates.

https://docs.microsoft.com/en-us/rest/api/consumption/

https://docs.microsoft.com/en-us/rest/api/consumption/usagedetails/lis

https://docs.microsoft.com/en-us/azure/cost-management-billing/manage/usage-rate-card-overview

Unable to get storage account used capacity metric

Hi,

I'm not able to get the Azure storage account usedcapacity metric, the exporter always gives me back 0(should be 181.8MiB).

I can see differences between the request made by Azure portal to get the usedcapacity metric en the one made by Prometheus.

Azure portal:

"/subscriptions/XXXXXXXXX/resourceGroups/rg-dev/providers/Microsoft.Storage/storageAccounts/devwww/providers/microsoft.Insights/metrics?timespan=2019-11-11T15:35:00.000Z/2019-11-12T15:35:00.000Z&interval=FULL&metricnames=UsedCapacity&aggregation=average&metricNamespace=microsoft.storage%2Fstorageaccounts&validatedimensions=false&api-version=2019-07-01"

Prometheus Exporter:

/subscriptions/XXXXXXXXXXXXXX/resourceGroups/rg-dev/providers/Microsoft.Storage/storageAccounts/devwww/providers/microsoft.insights/metrics?aggregation=Total%2CAverage%2CMinimum%2CMaximum&api-version=2018-01-01&metricnames=UsedCapacity&timespan=2019-11-12T15%3A41%3A53Z%2F2019-11-12T15%3A42%3A53Z

Prometheus gives me

# HELP usedcapacity_bytes_average usedcapacity_bytes_average
# TYPE usedcapacity_bytes_average gauge
 usedcapacity_bytes_average{resource_group="rg-dev",resource_name="devwww"} 0
# HELP usedcapacity_bytes_max usedcapacity_bytes_max
# TYPE usedcapacity_bytes_max gauge
usedcapacity_bytes_max{resource_group="rg-dev",resource_name="devwww"} 0
# HELP usedcapacity_bytes_min usedcapacity_bytes_min
# TYPE usedcapacity_bytes_min gauge
usedcapacity_bytes_min{resource_group="rg-dev",resource_name="devwww"} 0
# HELP usedcapacity_bytes_total usedcapacity_bytes_total
# TYPE usedcapacity_bytes_total gauge
usedcapacity_bytes_total{resource_group="rg-dev",resource_name="devwww"} 0

I've modify the exporter's code to generate a request with parameters like the Azure portal.

/subscriptions/XXXXXXX/resourceGroups/rg-dev/providers/Microsoft.Storage/storageAccounts/probstgwww/providers/microsoft.Insights/metrics?aggregation=average&api-version=2019-07-01&autoadjusttimegrain=true&interval=PT5M&metricNamespace=microsoft.storage%2Fstorageaccounts&metricnames=UsedCapacity&timespan=2019-11-12T15%3A36%3A48Z%2F2019-11-12T15%3A37%3A48Z&validatedimensions=false

But the exporter still gives me 0.

My configuration:

credentials:
  subscription_id: XXXXXXX
  client_id: XXXXXX
  client_secret: XXXXXX
  tenant_id: XXXXXX

resource_groups:
  - resource_group: "rg-dev"
  resource_types:
  - Microsoft.Storage/storageAccounts
  metrics:
  - name: "UsedCapacity"

Any idea?

Thanks

Feature discussion: Allow choosing aggregations?

Currently the total, average, min and max aggregations are all always used. Would it make sense to allow choosing which of these are relevant in the config file? If not configured, it would keep returning all of them.

Feature proposition: fetch resources by tag

Use case: if a user need a more dynamic configuration strategy, he would need to collect metrics based on resources tags, without having to hardcode resource names or resource groups in configuration.

The feature would add the possibility to use a tag name and a tag value as a filter for the resources to be monitored. The implementation would be similar to the resource group feature.
I would use the filter feature of the resources list API.

I can take care of that and create a PR to add this feature.
I propose to add the feature in the listDefinitions method in a subsequent PR.

Metrics where name contains unit name gets double suffix

For example DBforPostgreSQL metrics has a metric called io_consumption_percent. The reported unit is Percent, so the Prometheus metric names are set to io_consumption_percent_percent_average, etc.

azure_metrics_exporter/main.go

Line 64 in 56371e2

metricName = strings.ToLower(metricName + "_" + value.Unit)

Would a check with strings.HasSuffix before doing the above be acceptable, or are there better ways to fix it?

Can't fetch metrics from multiple resource_types.

I am trying to fetch metrics from multiple resource type using the below config is not working, can you please let me know how to achieve this.

targets:
resource_groups:
  - resource_group: "RG-AZURE"
    resource_types:
      - "Microsoft.Network/loadBalancers"
    metrics:
      - name: "PacketCount"

resource_groups:
  - resource_group: "RG-AZURE"
    resource_types:
      - "Microsoft.DBforPostgreSQL/servers"
    metrics:
      - name: "active_connections"
      - name: "storage_used"

Dimensions support for metrics

Hi,

I am unable to add a dimension for Event hub namespace, say a particular event hub entity for the namespace. Is there a way to achieve this?

Is it possible to provide releases packages for download

releases download

Runtime error while running

Hello,

I am facing issues while trying to run the exporter, I am wondering If I have missed some necessary pre-requisites or configuration changes:

$ ./azure_metrics_exporter --config.file="azure.yml"
2021/01/12 04:53:14 Using managed identity
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x827dd5]

goroutine 1 [running]:
main.(*AzureClient).getAccessToken(0xc0000662c0, 0x0, 0x0)
	/home/tsourab/go/src/github.com/RobustPerception/azure_metrics_exporter/azure.go:224 +0x355
main.main()
	/home/tsourab/go/src/github.com/RobustPerception/azure_metrics_exporter/main.go:321 +0xdf

azure.yml

---
active_directory_authority_url: "https://login.microsoftonline.com/"
resource_manager_url: "https://management.azure.com/"
credentials:
  subscription_id: "<subscription ID here>" 

resource_groups:
  - resource_group: "tsourab-metric"
    resource_types:
      - "Microsoft.Compute/virtualMachines"
    resource_name_include_re:
      - "metrics"
    metrics:
      - name: "CPU Credits Consumed"

OS: Fedora 33
Kernel: 5.8.15-301.fc33.x86_64
Go version: go version go1.15.6 linux/amd64

Samples for configuration file azure.yml

Hello Team.

Do you guys have an example of azure.yml to get metrics from PaaS services like Azure SQL, Azure Redis for cache.

All the best.

400 Error when try to query all Iot metrics, more than 60

the total number of metrics is about 60.

if we decrease it to 5-6, that works.

how's your guys experience?

LB VIP Availability - Frontend Port to show data path availability for each port

We have a situation where we need to expose the LB Data Path availability of individual ports, but with the current configuration it's showing altogether, metrics are not exposed for individual ports.
Here is our Configuration:

targets:resource_groups:  - resource_group: "&lt;%= p('resource_group') %&gt;"    resource_types:    - "Microsoft.Network/loadBalancers"    resource_name_exclude_re:    - ".*service-fabrik.*"    - ".*pgoutboundconnectionhelper.*"    metrics:    - name: "PacketCount"    - name: "VipAvailability"    - name: "DipAvailability"    - name: "ByteCount"    - name: "SYNCount"    - name: "SnatConnectionCount"    - name: "AllocatedSnatPorts"    - name: "UsedSnatPorts"

can you please let us know if we can adjust our configuration to get the availability of individual ports, or is it something not supported yet ?

Expected:

	for _, t := range c.ResourceGroups {
	if err := c.validateAggregations(t.Aggregations); err != nil {
	return err
	}

	if len(t.ResourceGroup) == 0 {
	return fmt.Errorf("resource_group needs to be specified in each resource group")
	}

	if len(t.ResourceTypes) == 0 {
	return fmt.Errorf("At lease one resource type needs to be specified in each resource group")
	}

	if len(t.Metrics) == 0 {
	return fmt.Errorf("At least one metric needs to be specified in each resource group")
	}
	}