hashicorp / go-metrics Goto Github PK

A Golang library for exporting performance and runtime metrics to external metrics systems (i.e. statsite, statsd)

License: MIT License

Go 100.00%

go-metrics's Introduction

go-metrics

This library provides a metrics package which can be used to instrument code, expose application metrics, and profile runtime performance in a flexible manner.

Current API:

Sinks

The metrics package makes use of a MetricSink interface to support delivery to any type of backend. Currently the following sinks are provided:

StatsiteSink : Sinks to a statsite instance (TCP)
StatsdSink: Sinks to a StatsD / statsite instance (UDP)
PrometheusSink: Sinks to a Prometheus metrics endpoint (exposed via HTTP for scrapes)
InmemSink : Provides in-memory aggregation, can be used to export stats
FanoutSink : Sinks to multiple sinks. Enables writing to multiple statsite instances for example.
BlackholeSink : Sinks to nowhere

In addition to the sinks, the InmemSignal can be used to catch a signal, and dump a formatted output of recent metrics. For example, when a process gets a SIGUSR1, it can dump to stderr recent performance metrics for debugging.

Labels

Most metrics do have an equivalent ending with WithLabels, such methods allow to push metrics with labels and use some features of underlying Sinks (ex: translated into Prometheus labels).

Since some of these labels may increase the cardinality of metrics, the library allows filtering labels using a allow/block list filtering system which is global to all metrics.

If Config.AllowedLabels is not nil, then only labels specified in this value will be sent to underlying Sink, otherwise, all labels are sent by default.
If Config.BlockedLabels is not nil, any label specified in this value will not be sent to underlying Sinks.

By default, both Config.AllowedLabels and Config.BlockedLabels are nil, meaning that no tags are filtered at all, but it allows a user to globally block some tags with high cardinality at the application level.

Examples

Here is an example of using the package:

func SlowMethod() {
    // Profiling the runtime of a method
    defer metrics.MeasureSince([]string{"SlowMethod"}, time.Now())
}

// Configure a statsite sink as the global metrics sink
sink, _ := metrics.NewStatsiteSink("statsite:8125")
metrics.NewGlobal(metrics.DefaultConfig("service-name"), sink)

// Emit a Key/Value pair
metrics.EmitKey([]string{"questions", "meaning of life"}, 42)

Here is an example of setting up a signal handler:

// Setup the inmem sink and signal handler
inm := metrics.NewInmemSink(10*time.Second, time.Minute)
sig := metrics.DefaultInmemSignal(inm)
metrics.NewGlobal(metrics.DefaultConfig("service-name"), inm)

// Run some code
inm.SetGauge([]string{"foo"}, 42)
inm.EmitKey([]string{"bar"}, 30)

inm.IncrCounter([]string{"baz"}, 42)
inm.IncrCounter([]string{"baz"}, 1)
inm.IncrCounter([]string{"baz"}, 80)

inm.AddSample([]string{"method", "wow"}, 42)
inm.AddSample([]string{"method", "wow"}, 100)
inm.AddSample([]string{"method", "wow"}, 22)

....

When a signal comes in, output like the following will be dumped to stderr:

[2014-01-28 14:57:33.04 -0800 PST][G] 'foo': 42.000
[2014-01-28 14:57:33.04 -0800 PST][P] 'bar': 30.000
[2014-01-28 14:57:33.04 -0800 PST][C] 'baz': Count: 3 Min: 1.000 Mean: 41.000 Max: 80.000 Stddev: 39.509
[2014-01-28 14:57:33.04 -0800 PST][S] 'method.wow': Count: 3 Min: 22.000 Mean: 54.667 Max: 100.000 Stddev: 40.513

go-metrics's People

Stargazers

Watchers

Forkers

pearkes therealbill gaurav46 bakins pjvds stapelberg juliusv wolfeidau buglomi spheromak kung-foo sean- varaddeolankar jessejlt jmptrader agilebits szaydel jsternberg ryanuber ateleshev pombredanne ldesiqueira d-smith josler raceli jim139 bobbwu tpot discordianfish maier dominicm puneetpruthi powerchordinc jeinwag sovchinn nothingmuch jabley richardbowden venumeda gxed thingsbrain ds0nt tutuming iron-io secsecsec bobbytables gcmalloc erlang-boy polachok phinze guangminglion iketutg yanghongkjxy kyhavlov freeekanayaka mmcquillan locktp edwardbetts banks cabecada dadgar sasg valerijep forging2012 gosundy saj snyh g-research stevenroose ninefive pierresouchay povilasv mkeeler wfxiang08 freddygv vpol uepoch etsangsplk eclipseo circleci-archived eranharel beeketing hugoluchessi zuiwanting leocavaille cgbaker jalright notnoop cai-personal sargun beekus awesomegolang bosconi isgasho mengjinglei iqoqo datadog yaowenqiang jimrazmus hepyu

go-metrics's Issues

in-mem flattenkey is non-performant

https://github.com/armon/go-metrics/blob/master/inmem.go#L312-L348

The current flattenKey logic loops through key elements with a replacer, adding the element and then a .. Each call is performing a replace op.

It would be more efficient to instead perform a string.Join on the elements, and write once only to the replacer (single replace op). Similar logic could be applied to flattenKeyLabels

Question: composition of metric key in Gauge - host first / last

Hi,
Another question here. I'm used to the host part in metrics being last, e.g.:

env.product.category.metric.host

Makes the host somewhat insignificant in data slicing, and grouping is an easy env.product.category.metric.* which is really true for today's services.

Historically (that is, years ago), I've been inspired by Google's RockSteady as guidelines to sculpt my metrics.

So a real key would look like:

production.main_api.routes.health.GET.apisrv110

However, with go-metrics I see that the host comes quite first. Is there something I can learn from this? do you find it better at that position? would love to hear your experience about this.

Thanks

tests are broken: statsd_test.go:78: bad line gauge.val:1.000000|g

In a fresh clone of the go-metrics repository, tests fail:

% go version
go version go1.20.2 linux/amd64

% git clone https://github.com/hashicorp/go-metrics
Cloning into 'go-metrics'...
remote: Enumerating objects: 791, done.
remote: Counting objects: 100% (222/222), done.
remote: Compressing objects: 100% (132/132), done.
remote: Total 791 (delta 113), reused 176 (delta 87), pack-reused 569
Receiving objects: 100% (791/791), 254.92 KiB | 7.97 MiB/s, done.
Resolving deltas: 100% (414/414), done.

% cd go-metrics 

% go test ./...
go: downloading github.com/DataDog/datadog-go v3.2.0+incompatible
go: downloading github.com/circonus-labs/circonus-gometrics v2.3.1+incompatible
go: downloading github.com/circonus-labs/circonusllhist v0.1.3
go: downloading github.com/hashicorp/go-retryablehttp v0.5.3
go: downloading github.com/tv42/httpunix v0.0.0-20150427012821-b75d8614f926
go: downloading github.com/hashicorp/go-cleanhttp v0.5.0
--- FAIL: TestDisplayMetrics (0.00s)
    inmem_endpoint_test.go:34: bad: [0xc0000942a0 0xc000094360]
--- FAIL: TestStatsd_Conn (3.00s)
    statsd_test.go:78: bad line gauge.val:1.000000|g
    statsd_test.go:134: timeout
2023/06/19 14:12:38 [ERR] Error connecting to statsd! Err: dial udp: address statsd.service.consul: missing port in address
2023/06/19 14:12:38 [ERR] Error connecting to statsd! Err: dial udp: lookup statsd.service.consul: no such host
2023/06/19 14:12:38 [ERR] Error connecting to statsite! Err: dial tcp: lookup someserver: no such host
2023/06/19 14:12:38 [ERR] Error connecting to statsd! Err: dial udp: lookup someserver: no such host
--- FAIL: TestStatsite_Conn (3.00s)
    statsite_test.go:74: bad line gauge.val:1.000000|g
    statsite_test.go:131: timeout
FAIL
FAIL	github.com/hashicorp/go-metrics	26.450s
ok  	github.com/hashicorp/go-metrics/circonus	0.004s
ok  	github.com/hashicorp/go-metrics/datadog	1.248s
ok  	github.com/hashicorp/go-metrics/prometheus	2.010s
FAIL
go test ./...  5,40s user 1,50s system 25% cpu 27,238 total

I also noticed that there is a CircleCI setup, but I don’t see any CI reports in pull requests.

Maybe it would be good to set up GitHub Actions instead? Or integrate CircleCI better somehow?

Thanks

feature request: Riemann sink

I'd love to have a sink for Riemann. If I have a chance and don't get beaten to it, I'll contribute it.

Consider adding mechanism to register alternate Sinks with NewMetricSinkFromURL

In my case, I'd like to use the provided sub-package Dogstatsd sink but without loosing the abstraction this library provides.

While I could write my own config/setup method that parses URL and does something different if protocol is dogstatsd:// (i.e. instantiates dogstatsd sink directly instead of using NewMetricSinkFromURL) it seems undesirable to reinvent that wheel.

Proposal

Allow external packages to register syncs early in execution (i.e. in func init()) that can hook into the central mechanism.

That might look something like this:

// Existing registry
var sinkRegistry = map[string]sinkURLFactoryFunc{
	"statsd":   NewStatsdSinkFromURL,
	"statsite": NewStatsiteSinkFromURL,
	"inmem":    NewInmemSinkFromURL,
}

// Add new mutex - might be unnecessary if we document that it's only safe to register in 
// `init` methods but I haven't found spec to say those are guaranteed not to be run in 
// parallel yet either.
var registryMu sync.RWMutex

func RegisterSyncForScheme(scheme string, factory sinkURLFactoryFunc) {
    registryMu.Lock()
    defer registryMu.Unlock()

    sinkRegistry[scheme] = factory
}

// Existing func
func NewMetricSinkFromURL(urlStr string) (MetricSink, error) {
    // ...
    registryMu.RLock()
    sinkURLFactoryFunc := sinkRegistry[u.Scheme]
    registryMu.RUnlock()
    /// ...
}

Then the included additional sinks or custom sinks defined in outer packages can use the same method to be instantiated at runtime.

I made an issue rather than just a straight PR with that code in just to check that there is no good reason this wasn't done already. If you agree it's worthwhile @armon I can make a PR.

FAIL: TestMetrics_EmitRuntimeStats

On amd64 with go-1.10:

--- FAIL: TestMetrics_EmitRuntimeStats (0.00s) 
        metrics_test.go:290: bad val: [4 395184 4.85196e+06 5605 1020 4585 66625 1 66625]

Make mapping of metric parts to Prometheus metrics/labels configurable

Prometheus's true power comes from its multi-dimensional data model: metric names with arbitrary key/value dimensions. I wonder if it would be a good idea to allow some configurability of the PrometheusSink to allow such mappings. If I understand the go-metrics data model correctly, it's StatsD/Graphite-like, in the sense that a metric simply has a number of parts (separated by dots or whatever else in the end). In Prometheus's statsd-bridge, we already have a very simple mapping language for this type of componentized metric into more useful Prometheus metrics: https://github.com/prometheus/statsd_bridge#metric-mapping-and-configuration

However, it could / should arguably be even a bit more powerful, like allowing regex-based rewriting of metrics and labels.

/cc @stapelberg, who built the PrometheusSink and got bitten by getting the hostname in metric names via this package (via Hashicorp's Raft package using go-metrics).

Feature: InfluxDB sink

Hi, after rewriting Telegaf StatsD templates for he 3rd time caused by nomad telemetry changes, I think maybe it's better just to add InfluxDB sink?
Let know if this is something that can be accepted, and I'll try to provide it.
Regards

Prometheus sink handling of metric help is incorrect

I came here via investigating why a dependent project (HashiCorp Consul) was emitting extra spurious metrics with differing label sets. I discovered it's actually a suboptimal work-around for go-metrics' API for specifying metrics help.

Prometheus very strongly recommends that every metric name should be associated with only one set of label keys - only the label values should differ.

Consequentially, Prometheus' metric representation format, only associates metric help strings with the metric name - labels are ignored.

go-metrics makes a conflicting decision to associate metric help with metric type, name, label keys and values.

The overall consequence for users, is that it becomes impossible, using go-metrics, to specify metric help, for a metric which requires label values, which are only known at runtime - which is quite a common pattern.

Unless, that is, they break with the Prometheus recommendation that every metric name should be associated with only one set of label keys, and configure go-metrics with a dummy metric with an empty label set, to carry the help message, and write all their actual metrics using the same metric name, but a populated label set. This is the suboptimal workaround I spoke of at the start.

To fix this:

The help map within a PrometheusSink should change from being keyed on type.name;labelkey=labelvalue to just name
There should be a new Definition object similar to Gauge/Counter/SummaryDefinition, which only populates a name => help mapping for future use with ephemeral metrics, but does not create any metric.

Statsd multi frame ends with newline causing bad lines in Etsy's statsd server log

Please remove the newline at the end of a multi frame before sending it to a Statsd server.
That last newline is causing the Etsy statsd server to output bad lines in the logging.

So use "\n" to join commands into a single frame, but not end all commands with it.

I already got this fixed in cactus/go-statsd-client#17

Clarify thread safety for metrics methods

Is it safe to pass a metrics counter to a goroutine, or are users responsible for any locking?

Would you welcome an HTTP/inmem adapter?

Sort of like what expvar does https://golang.org/pkg/expvar/?
This also raises a design question, I see a pattern here of adapters exposing the same inmem store. It bothers me that it would make the inmem store "blessed" more than others.
What do you think?

Fix race condition in start.go

The initialization of the globalMetrics field is racy and this is exposed in parallel tests when run with the race detector. Wrapping the globalMetrics field in an atomic.Value does not have a performance impact.

$ go test -bench GlobalMetrics
Benchmark_GlobalMetrics_Direct/direct-8         	 5000000	       282 ns/op
Benchmark_GlobalMetrics_Direct/atomic.Value-8   	10000000	       352 ns/op
$ go test -bench GlobalMetrics
Benchmark_GlobalMetrics_Direct/direct-8         	 5000000	       326 ns/op
Benchmark_GlobalMetrics_Direct/atomic.Value-8   	10000000	       251 ns/op

A bug for double called IncrCounterWithLabels Method in PrometheusSink

p.mu.Lock() defer p.mu.Unlock() key, hash := p.flattenKey(parts, labels) **g, ok := p.counters[hash]** if !ok { g = prometheus.NewCounter(prometheus.CounterOpts{ Name: key, Help: key, ConstLabels: prometheusLabels(labels), }) prometheus.MustRegister(g) **p.counters[key] = g** } g.Add(float64(val))

When finding the counter, the method use "hash" as the key to search the counter. However, it put the counter into the map with "key", which make bug in double calling for IncrCounterWithLabels or any other XXXWithLabels.

please tag and version this project

Hello,

Can you please tag and version this project?

I am the Debian Maintainer for go-metrics and versioning would help Debian keep up with development.

Provide more stats for aggregated samples

I would love if there would also be median, Q1 and Q3 values, so that I could use box plot to draw data. Standard deviation assumes (normal) distribution of data while box plots do not.

Deprecate Hostname usage in DogStatsdSink

hostname is only being used for pulling it out when matched as metric prefix:
https://github.com/armon/go-metrics/blob/master/datadog/dogstatsd.go#L63-L82

This is not only not documented but also a very unexpected behaviour that has caused us some problems in the past (where are my metrics?) -- See netlify/netlify-commons#140

I can submit a PR if needed.

TestInmemSignal fails

Hello!

We are getting the following test failure in Debian armhf:

=== RUN   TestInmemSignal
--- FAIL: TestInmemSignal (0.04s)
    inmem_signal_test.go:35: bad:

Full logs: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/armhf/golang-github-armon-go-metrics.html

Cheers,

Support labels with the same key in the inmem_endpoint.go implementation

In inmem_endpoint.go, the definitions of GaugeValue and SampledValue include a DisplayLabel field that is used to render the metrics output. Since it uses a map[string]string instead of []Label, it will squash any labels that have the same key.

For example, if I have the following labels:
{service ServiceName}, {tag bar}, {tag foo}

the foo tag will overwrite the bar tag when creating the DisplayLabel field, returning incorrect information about this metric.

FanoutSink key modified across sinks when DogStatsdSink is enabled

Problem

FanoutSink does not copy the key before passing it to different sink.
DogStatsdSink overrides the key slice in Line 73 when the hostname is the same as a key value:

go-metrics/datadog/dogstatsd.go

Lines 63 to 77 in 8586014

    
           func (s *DogStatsdSink) parseKey(key []string) ([]string, []metrics.Label) { 
        
           	// Since DogStatsd supports dimensionality via tags on metric keys, this sink's approach is to splice the hostname out of the key in favor of a `host` tag 
        
           	// The `host` tag is either forced here, or set downstream by the DogStatsd server 
        
           	var labels []metrics.Label 
        
           	hostName := s.hostName 
        
           	// Splice the hostname out of the key 
        
           	for i, el := range key { 
        
           		if el == hostName { 
        
           			key = append(key[:i], key[i+1:]...) 
        
           			break 
        
           		} 
        
           	}

Reproduction Steps

Create a FanoutSink
Add a DogStatsdSink with a hostname
Add another Sink (e.g. InmemSink)
Call SetGaugeWithLabels() with the hostname as a value in the keys

Example of a test I added in datadog/dogstatsd_test.go :

func TestFanoutSink(t *testing.T) {
	server, _ := setupTestServerAndBuffer(t)
	defer server.Close()

	dog, err := NewDogStatsdSink(DogStatsdAddr, "consul")
	require.NoError(t, err)

	inmem := metrics.NewInmemSink(10*time.Second, 10*time.Second)

	fSink := metrics.FanoutSink{dog, inmem}
	fSink.SetGaugeWithLabels([]string{"consul", "metric"}, 10, []metrics.Label{{Name: "a", Value: "b"}})

	intervals := inmem.Data()
	require.Len(t, intervals, 1)

	if intervals[0].Gauges["consul.metric;a=b"].Value != 10 {
		t.Fatalf("bad val: %v", intervals[0].Gauges)
	}
}

Output

--- FAIL: TestFanoutSink (0.00s)
    /<>/HashiCorp/go-metrics/datadog/dogstatsd_test.go:164: bad val: map[metric.metric;a=b:{metric.metric  10 [{a b}] map[]}]
FAIL
FAIL	github.com/hashicorp/go-metrics/datadog	0.694s
FAIL

Expected Output

I would expect the key in the inmemsink to remain consul.metric but it is modified to metric.metric

Fix

The FanoutSink should ideally copy the key before passing it to sinks to avoid modifications

Cannot deploy to Google App Engine

Google App Engine does not allow syscall imports and there are several syscall dependencies in the library.

Unfortunately, I am still learning golang and can't suggest a solution yet but wanted to at least let you know about the challenge with GAE deployments.

Example error:
go-app-builder: Failed parsing input: parser: bad import "syscall" in github.com/armon/go-metrics/const_unix.go from GOPATH

Question: Logging flexibility

Hi,
I have noticed StatsdSink (and probably others) use log. Any idea how to make them write to my own systems' logging infra? (I am currently using logrus). I'd like to detect when UDP cannot send or there's any errors, and for that to happen I'm counting on logrus to log errors directly to an alerted log receiver.

Of course, my only thought right now is to localize and duplicate StatsdSink and swap out references to log with my own logger.

Thanks

Please update prometheus lib to 1.11.1 due to CVE-2022-21698

all older versions of the prometheus client_golang package have an vulnerability rated HIGH flagged by multiple security scanner for all apps using your library.

Can you please update the old 1.4.0 version to latest 1.11.1 and publish a new release for others to pick up?
Thanks in advance.

https://nvd.nist.gov/vuln/detail/CVE-2022-21698

Statsd telemetry doesn't recover from statsd outage

From: hashicorp/vault#1932

It appears that go-metrics doesn't handle a disconnect of the statsd server, particularly if the address changes.

We are running a telegraf agent with a statsd listener and configuring vault to send data to a linked container with a hostname. When the linked container is restarted (generally getting a new IP address), we stop receiving statsd metrics from vault.

Prometheus usage instructions

How can go-metrics be used with Prometheus? The farthest I came is:

package main

import (
    "net/http"
    "time"

    "github.com/armon/go-metrics"
    "github.com/armon/go-metrics/prometheus"
)

func main() {
    sink, _ := prometheus.NewPrometheusSink()
    m, err := metrics.NewGlobal(metrics.DefaultConfig("goTest"), sink)
    if err != nil {
        panic(err)
    }

    go func() {
        for {
            m.IncrCounter([]string{"requestCounter"}, 1)
            time.Sleep(time.Second)
        }
    }()

    select {}
}

Some parts are missing like setting up an HTTP handler, HTTP server, etc ... And I don't know how to enable Prometheus server to scrape this. I've looked at the source code and I have a feeling that something is missing.

Question: implication of key flattening in terms of GC and performance

Hi,
I am wondering if the key flattening process on each sink is affecting performance? (since this is a metrics library, it might affect the metrics that it measures itself).

Thanks

Use of TCP makes this incompatible with StatsD

The exclusive use of TCP for sending metrics to the backend metrics server means this library is incompatible with StatsD. Therefore Consul is unable to send metrics to StatsD, as it uses this library. Since statsite itself supports UDP, this library should as well.

TestDisplayMetrics flaky

Hi!

During automated builds of go-metrics 0.3.4 by the Debian CI infrastructure, we noticed a test failure on one of our armhf builder (arm high-float). The issue disappeared after a retry.

All the logs are at https://ci.debian.net/packages/g/golang-github-armon-go-metrics/testing/armhf/, and the two logs of interest here are:

failure at 2021-09-11 21:15:02 UTC: https://ci.debian.net/data/autopkgtest/testing/armhf/g/golang-github-armon-go-metrics/15197663/log.gz
success at 2021-09-12 14:55:43 UTC: https://ci.debian.net/data/autopkgtest/testing/armhf/g/golang-github-armon-go-metrics/15198902/log.gz

The lines of interest in the logs:

=== RUN   TestDisplayMetrics
    inmem_endpoint_test.go:29: bad: [0xd28240 0xd28400]
--- FAIL: TestDisplayMetrics (0.00s)

Looking deeper in the various logs, I noticed that this issue appeared also on the x86_64 architecture:

2021-07-25 08:26:55 UTC: https://ci.debian.net/data/autopkgtest/testing/amd64/g/golang-github-armon-go-metrics/13923270/log.gz

Builds against Go 1.6 and 1.7 are broken

It seems this happened after ded85ed, which started using the Go-1.8-specific sort.Slice API.

Enhancement: Windowed Sink

Hi, I've started working on a windowed version of the InmemSink. This allows you to specify a time range where samples are kept. Rather than the fixed interval of InmemSink, this is a rolling window where samples expire. Statistics are mostly calculated on demand (rather than at ingestion time). Samples are collected in a fixed size ring buffer and are either overwritten or expired as needed.

Example usage:

// setup windows for a sample rate of 2Hz
cpu5 := metrics.NewWindowSink(time.Second*5, 10)
cpu15 := metrics.NewWindowSink(time.Second*15, 30)
cpu60 := metrics.NewWindowSink(time.Second*60, 120)

// use fanout sink to write to all three windows
cpuSink := &metrics.FanoutSink{cpu5, cpu15, cpu60}

key := []string{"cpu"}

go func() {
    // collect CPU utilization every 500ms
    for _ = range time.Tick(time.Millisecond * 500) {
        v, _ := cpu.CPUPercent(0, false)
        cpuSink.AddSample(key, float32(v[0]))
    }
}()

// wait for first samples to come in
time.Sleep(time.Second * 1)

for _ = range time.Tick(time.Second * 1) {
    log.Printf("5s: %05.2f%%, 15s: %05.2f%%, 60s: %05.2f%%\n",
        cpu5.Sample(key).Mean(),
        cpu15.Sample(key).Mean(),
        cpu60.Sample(key).Mean())
}

I've only implemented sink.AddSample() so far, but am interested in your comments.

Current branch: https://github.com/kung-foo/go-metrics/tree/window

jonathan

TestMetrics_MeasureSince is faling on arm64

TestMetrics_MeasureSince is failing in Ubuntu Groovy 20.10 (development release) on arm64 (Go 1.14), here is the test log. I patched the test to print m (MockSink) after each call of MeasureSince or MeasureSinceWithLabels (4 calls in this test) and commented out the t.Fatalf("") when m.vals[0] > 0.1. Then I ran the test 5 times, here are the results:

Round 1:
&{{0 0} [[key]] [0.00192] [[]]}
&{{0 0} [[key]] [0.268622] [[{a b}]]}
&{{0 0} [[timer key]] [0.294562] [[]]}
&{{0 0} [[service key]] [0.307243] [[]]}

Round 2:
&{{0 0} [[key]] [0.00228] [[]]}
&{{0 0} [[key]] [0.260842] [[{a b}]]}
&{{0 0} [[timer key]] [0.299362] [[]]}
&{{0 0} [[service key]] [0.314462] [[]]}

Round 3:
&{{0 0} [[key]] [0.00192] [[]]}
&{{0 0} [[key]] [0.334702] [[{a b}]]}
&{{0 0} [[timer key]] [0.373842] [[]]}
&{{0 0} [[service key]] [0.408443] [[]]}

Round 4:
&{{0 0} [[key]] [0.00234] [[]]}
&{{0 0} [[key]] [0.220862] [[{a b}]]}
&{{0 0} [[timer key]] [0.254622] [[]]}
&{{0 0} [[service key]] [0.290202] [[]]}

Round 5:
&{{0 0} [[key]] [0.001481] [[]]}
&{{0 0} [[key]] [0.283643] [[{a b}]]}
&{{0 0} [[timer key]] [0.319363] [[]]}
&{{0 0} [[service key]] [0.332583] [[]]}

As you can see the m.vals[0] is lesser or equal to 0.1 in all rounds only in the first scenario covered by this test, all the others fail because they are greater than 0.1. I am not sure why those operations take longer in arm64 than in other architectures (I did not spend too much time investigating the root cause). As a workaround for now I plan to patch this test in the Debian package to check m.vals[0] > 0.5 instead of m.vals[0] > 0.1.

Question: inmem sink and float64

Hi,
Another quick question - is there a possibility of overflow error in the AggregationSample (sum is based on float64) when a service is run a really long time?

Timestamp shoud be .... a (float) timestamp

go-metrics/inmem_endpoint.go

Line 90 in 9d55fb2

Timestamp: interval.Interval.Round(time.Second).UTC().String(),

See https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#timestamps : Timestamp should be an integer of float.

TestMetrics_EmitRuntimeStats fails

Hello!

I am preparing a new version of go-metrics for Debian and I am getting this error:

=== RUN   TestMetrics_EmitRuntimeStats
--- FAIL: TestMetrics_EmitRuntimeStats (0.00s)
    metrics_test.go:244: bad val: [9 125784 3.639544e+06 776 439 337 401814 1 401814]

feat: add Inc & Dec method for Gauge

Currently the go-metrics only support SetGauge & SetGaugeWithLabels for gauge，should add Inc & Dec to it.
for example: we export the metrics request_in_flight as Gauge，it's hard to implement it without Inc & Dec.

Librato Sink

I'm interested in adding a sink for Librato Metrics but didn't want to jump the gun and start committing too soon.

Is it safe to start working on this, and do you agree this fits as a sink?

Thanks!

mutex vs atomic usage

Is it a concern or not that mutexs are used where atomic could be substituted

rcrowley uses the atomic package for example
https://github.com/rcrowley/go-metrics/blob/master/gauge.go

https://texlution.com/post/golang-lock-free-values-with-atomic-value/

Metric emission methods in metrics.go modify input key slice

All the metric emission methods in Metrics from metrics.go are checking for the state of certain configuration flags like EnableHostname, and if set to true, are modifying the source key slice passed as an argument.

See here for one such example.

Modification of the input key slice causes it to be unusable when emitting it to multiple Metrics objects if any configuration flags which cause modification of the input key slice are enabled. If different metrics sinks need different configuration, it requires using multiple Metrics objects since the config is per-Metrics object.

This results in the client being responsible for creating copies of the same key slice for each time it emits the metric to different Metrics objects, which requires allocating a new []string. This drastically hurts the performance of emitting metrics.

FAIL: TestIncrCounter

On amd64 with go-1.10:

--- FAIL: TestIncrCounter (0.00s)
        circonus_test.go:123: Expected '{"foo`bar":{"_type":"n","_value":1}}', got '{"foo`bar":{"_type":"L","_value":1}}'

README.md Examples do not work

The Examples in the README.md file do not work, and there dose not seem to be any other documentation.

feat: support value based label filtering

Overview

Hopefully this is not a "holding it wrong" kinda thing.

Today, go-metrics supports named based label filtering. For instance:

https://github.com/armon/go-metrics/blob/master/metrics_test.go#L465

It'd be useful to support "value" based label filtering. For instance, between two metrics with labels:

notOkLabel := Label{Name: "ok_label", Value: "bad_value"}
okLabel := Label{Name: "ok_label", Value: "good_value"}

not allowing the metric that has a label value equal to bad_value can be useful in cases where I care for some of the label values but not all.

collector collisions

Overview

IIUC these lines https://github.com/armon/go-metrics/blob/master/prometheus/prometheus.go#L129-L134 will create a collision for the Collector type if we attempt to register more than one PrometheusSink.

Repro

One repro scenario is described here hashicorp/consul#11273 .

Also, I have a test can repro this and will comment below

Proposed Fixes

IMO, the way we implement .Describe() needs to be reworked

Multiple reports produced on SIGUSR1

I'm trying out this library using the examples from the README. At first everything seems to work fine
but after a few minutes it starts to produce these kind of reports when I send the signal.

[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.malloc_count': 3858833.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.free_count': 3843701.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.heap_objects': 15132.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_pause_ns': 10070482.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_runs': 67.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.num_goroutines': 9.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.alloc_bytes': 2120256.000
[2022-08-30 19:19:20 +0200 CEST][G] 'service-name.redacted.net.runtime.sys_bytes': 19090192.000
[2022-08-30 19:19:20 +0200 CEST][S] 'method.wow': Count: 53 Min: 3.000 Mean: 14.321 Max: 40.000 Stddev: 7.582 Sum: 759.000 LastUpdated: 2022-08-30 19:19:29.025847345 +0200 CEST m=+2781.237436879
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.num_goroutines': 9.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.alloc_bytes': 2239280.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.sys_bytes': 19090192.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.malloc_count': 3861438.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.free_count': 3844065.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.heap_objects': 17373.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_pause_ns': 10070482.000
[2022-08-30 19:19:30 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_runs': 67.000
[2022-08-30 19:19:30 +0200 CEST][S] 'method.wow': Count: 52 Min: 3.000 Mean: 15.019 Max: 48.000 Stddev: 7.952 Sum: 781.000 LastUpdated: 2022-08-30 19:19:39.763540055 +0200 CEST m=+2791.975129591
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.heap_objects': 19330.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_pause_ns': 10070482.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_runs': 67.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.num_goroutines': 9.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.alloc_bytes': 2344168.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.sys_bytes': 19090192.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.malloc_count': 3863680.000
[2022-08-30 19:19:40 +0200 CEST][G] 'service-name.redacted.net.runtime.free_count': 3844350.000
[2022-08-30 19:19:40 +0200 CEST][S] 'method.wow': Count: 41 Min: 7.000 Mean: 15.537 Max: 33.000 Stddev: 6.124 Sum: 637.000 LastUpdated: 2022-08-30 19:19:48.753152072 +0200 CEST m=+2800.964741606
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_runs': 67.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.num_goroutines': 9.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.alloc_bytes': 2478680.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.sys_bytes': 19090192.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.malloc_count': 3866681.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.free_count': 3844796.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.heap_objects': 21885.000
[2022-08-30 19:19:50 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_pause_ns': 10070482.000
[2022-08-30 19:19:50 +0200 CEST][S] 'method.wow': Count: 64 Min: 2.000 Mean: 15.281 Max: 31.000 Stddev: 6.732 Sum: 978.000 LastUpdated: 2022-08-30 19:19:59.729819335 +0200 CEST m=+2811.941408869
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.num_goroutines': 9.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.alloc_bytes': 2577112.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.sys_bytes': 19090192.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.malloc_count': 3868758.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.free_count': 3845047.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.heap_objects': 23711.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_pause_ns': 10070482.000
[2022-08-30 19:20:00 +0200 CEST][G] 'service-name.redacted.net.runtime.total_gc_runs': 67.000
[2022-08-30 19:20:00 +0200 CEST][S] 'method.wow': Count: 36 Min: 4.000 Mean: 14.083 Max: 26.000 Stddev: 5.935 Sum: 507.000 LastUpdated: 2022-08-30 19:20:08.690175115 +0200 CEST m=+2820.901764649

I'm adding new samples using a couple of test goroutines:

	elapsed := time.Since(sTime).Microseconds()
	inm.AddSample([]string{"method", "wow"}, float32(elapsed))

Any clues?

Test failure: TestSetGauge (unreliable test?)

--- FAIL: TestSetGauge (0.00s) 
        circonus_test.go:94: Expected '{"foo`bar":{"_type":"n","_value":1}}', got '{"foo`bar":{"_type":"n","_value":"1"}}'

feat: Support float64 gauges

It can be useful to represent a Unix timestamp in a gauge. Prometheus metrics are float64s, so this works well when directly using the Prometheus libraries. However, because go-metrics Guage's are stored as float32s, there isn't enough precision to store a timestamp (https://go.dev/play/p/RvJ11Y8R7Cz) and times get truncated to about 2 minute intervals.

There's one instance of this in Vault: https://github.com/hashicorp/vault/blob/2dd4528ed8a8be83e0c1e0b48b1f23dab701181f/builtin/logical/pki/path_tidy.go#L1209
Which may be a bug.

Using labels causes Prometheus sink to panic

Note this in prometheus.go

func (p *PrometheusSink) SetGaugeWithLabels(parts []string, val float32, labels []metrics.Label) {
        p.mu.Lock()
        defer p.mu.Unlock()
        key, hash := p.flattenKey(parts, labels)
        g, ok := p.gauges[hash]
        if !ok {
                g = prometheus.NewGauge(prometheus.GaugeOpts{
                        Name:        key,
                        Help:        key,
                        ConstLabels: prometheusLabels(labels),
                })
                prometheus.MustRegister(g)
                p.gauges[key] = g
        }
        g.Set(float64(val))
}

it checks p.gauges[hash] and stores p.gauges[key] = g

storehttpserver_1  | 2018-05-25T12:13:51Z |DEBU| labels = [{name value}] service=store-http-server
storehttpserver_1  | panic: duplicate metrics collector registration attempted
storehttpserver_1  |
storehttpserver_1  | goroutine 13 [running]:
storehttpserver_1  | github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc42005e6c0, 0xc42029c7a0, 0x1, 0x1)
storehttpserver_1  | 	/Users/dougfort/go/src/github.com/prometheus/client_golang/prometheus/registry.go:362 +0x9e
storehttpserver_1  | github.com/prometheus/client_golang/prometheus.MustRegister(0xc42029c7a0, 0x1, 0x1)
storehttpserver_1  | 	/Users/dougfort/go/src/github.com/prometheus/client_golang/prometheus/registry.go:154 +0x53
storehttpserver_1  | github.com/armon/go-metrics/prometheus.(*PrometheusSink).IncrCounterWithLabels(0xc420184f60, 0xc42029c740, 0x1, 0x1, 0x0, 0xc420277a00, 0x1, 0x1)
storehttpserver_1  | 	/Users/dougfort/go/src/github.com/armon/go-metrics/prometheus/prometheus.go:117 +0x2b6

Richer key semantics to allow backend-specific transformations

Hi, I'm coming to this via DataDog/dd-agent#2959 and after seeing some of go-metric's code, in particular a related request in #9.

The problem is that different backends have different features that imply formatting rules. statsd, the simplest, only has a metric name and value. Datadog has name, value and key-value pairs ("tags"). Prometheus appears to have name, value and key-value pairs ("labels"). However, software using go-metrics would want to not have any backend-specific code (perhaps not even configuration as suggested by #9, because then different configurations for prometheus, datadog, and so on would need to be done, with potentially different APIs/settings based on features and community standards of the particular backend).

My suggestion is to pass richer metadata into the backends using the existing API, allowing backends to transform the metric according to the backend's preferences. The new rule is that a particular starting character in a name component identifies a key/tag/variable, and the next name component is interpreted as the corresponding value.

Before the change:

met.SetGauge([]string{"nomad", "myapp", "memory", "used"}, float32(1))

is the amount of memory used by job "myapp" in Nomad. Job names are set by users, so that component in the key is changeable. With the suggested change, this call would become:

met.SetGauge([]string{"nomad", ":appname", "myapp", "memory", "used"}, float32(1))

thus telling all backends that there is a structured piece of information passed within the key, with name "appname" and value "myapp".

Immediately we can construct very simple but idiomatic default transforms:

statsd: suppresses all names from the key string and retains the values, since it doesn't support any richer formatting, resulting in nomad.myapp.memory.used|1.
datadog: removes all structured pieces and emits them as tags: equivalent to output of SetGaugeWithTags({"nomad", "memory", "used"}, 1, {"appname:myapp"})
prometeus: equivalent to output of NewGaugeVec({"nomad", "memory", "used"}, 1, {"appname:myapp"})

Note how this can be immediately applied to structured pieces added by go-metrics itself:
met.HostName = "test"
met.EnableHostname = true
met.SetGauge([]string{"key"}, float32(1))

would call
m.sink.SetGauge({":hostname", m.HostName, key}, val)

and Datadog/Prometeus would automatically emit the hostname as tag, which most users of these services would be very happy about, while statsd output would be unchanged, without any changes on the gometrics library users.

Observe that saying "we can just pass key/value pairs separately via a different API" is not enough to match this functionality, because the statsd backend would not have any rules on how to transform this information into a metric key -- it would require a "format string" or to adopt one standard behavior which might or might not be a breaking change to many users.

This can also contribute to the issue faced by @juliusv, @stapelberg, @jeinwag and possibly others who would like to do richer transforms and extraction -- except, instead of having to extract meaningful key/value pairs with regex (and potentially facing issues with unintentional matches, etc) the key/value pairs are already provided in the call to the backend. Further transforms can then be applied on this richer source information if desired.

I'm curious if this sounds reasonable and whether a PR implementing this is likely to be considered. I would be willing to give this a stab. Thanks for your feedback!

Prometheus sink

I'm working on a sink for Prometheus (http://prometheus.github.io). Since it depends on the Prometheus client library, I'm wondering whether you'd merge it into the go-metrics package directly (which currently has no dependencies except for the std library) or whether I should put it into a separate package...?

Only SetGauge adds the hostname to the key

I just started using go-metrics and noticed that for the output (when using InmemSink) has the following:

[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.num_goroutines': 9.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.alloc_bytes': 1947560.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.sys_bytes': 8231160.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.malloc_count': 463503.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.free_count': 435483.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.heap_objects': 28020.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.total_gc_pause_ns': 21151072.000
[2016-04-12 22:03:20 +0000 UTC][G] 'go-proxy.vagrant-ubuntu-trusty-64.runtime.total_gc_runs': 8.000
[2016-04-12 22:03:20 +0000 UTC][S] 'go-proxy.runtime.gc_pause_ns': Count: 4 Min: 103050.000 Mean: 1246662.500 Max: 4138151.000 Stddev: 1931961.901 Sum: 4986650.000 LastUpdated: 2016-04-12 22:03:28.830586229 +0000 UTC

My hostname is vagrant-ubuntu-trusty-64. I noticed that all the keys contain it except for the last one. I checked the source code and it looks like only SetGauge() adds the hostname to the key in these lines:

    if m.HostName != "" && m.EnableHostname {
        key = insert(0, m.HostName, key)
    }

EmitKey(), IncrCounter() & AddSample() do not contain those lines. Is there a particular reason for that?

Race condition in in `DisplayMetrics`

We ran into a race condition in Nomad. We were calling InmemSink.DisplayMetrics in a tight loop in a test, and another routine called SetGaugeWithLabels. The result was a fatal error: concurrent map iteration and map write. The stack is attached. This was trivially reproduced.

The issue seems to be that InmemSink.DisplayMetrics retrieves the intervals from Data(), which is a deep copy to avoid race conditions. However, it doesn't use those:
https://github.com/armon/go-metrics/blob/f0300d1749da6fa982027e449ec0c7a145510c3c/inmem_endpoint.go#L46-L59
Instead, it uses i.intervals, without a mutex.

I'm submitting a PR that uses the intervals returned from Data(), which was presumably the intent.

	func (s *DogStatsdSink) parseKey(key []string) ([]string, []metrics.Label) {
	// Since DogStatsd supports dimensionality via tags on metric keys, this sink's approach is to splice the hostname out of the key in favor of a `host` tag
	// The `host` tag is either forced here, or set downstream by the DogStatsd server

	var labels []metrics.Label
	hostName := s.hostName

	// Splice the hostname out of the key
	for i, el := range key {
	if el == hostName {
	key = append(key[:i], key[i+1:]...)
	break
	}
	}

hashicorp / go-metrics Goto Github PK

go-metrics's Introduction

go-metrics

Sinks

Labels

Examples

go-metrics's People

Stargazers

Watchers

Forkers

go-metrics's Issues

Proposal

Problem

Reproduction Steps

Output

Expected Output

Fix

Overview

Overview

Repro

Proposed Fixes

Recommend Projects

Recommend Topics

Recommend Org