ringcentral / metrics-facade Goto Github PK

RingCentral Metrics Facade is a Java library for working with metrics, allowing extremely flexible configuration of metrics and their export, designed to be generic and not tied to a specific implementation.

License: MIT License

Java 100.00%

metrics prometheus java monitring ringcentral jmx zabbix open-source histogram hdrhistogram

metrics-facade's Introduction

RingCentral Metrics Facade

RingCentral Metrics Facade is a Java library for working with metrics,
allowing extremely flexible configuration of metrics and their export,
designed to be generic and not tied to a specific implementation.

Main Features Overview
Getting Started
Features
Metrics
Metrics Reporters
Metrics Producers
Integration
- Dropwizard MetricRegistry
License

Main Features Overview

Extremely flexible configuration

You can set up defaults and overrides for any subset of metrics
(by using a predicate) and almost any metric parameters.
In particular, you can specify a set of values to be calculated and exported.
For example, the implementation of a metric is not required to perform an expensive operation of
calculating percentiles, if percentiles are not included in the set of values defined for this metric.
See Getting Started for details and usage examples.

Advanced support for labeled metrics

In particular, Metrics Facade supports eviction and expiration for labeled metrics,
that is, for each labeled metric, it allows you to specify:

the maximum number of combinations of label values:
when this threshold is exceeded, the combination that has not been updated for the longest time,
will be automatically removed (will not waste system resources)
the expiration time for a combination of label values:
if a combination has not been updated during this time, it will be automatically removed

See Getting Started for details and usage examples.

Getting Started

Let us first introduce some important concepts that
we will use in this section and throughout the document.

Measurable value is a typed value that can be measured.

Examples: the number of requests processed, the number of requests
processed with errors, the average request processing time,
the amount of free memory, the number of items in the queue.

Each measurable value has a type.
Measurable value types are represented by subclasses of Measurable.
Examples:

Counter.COUNT (the number of requests processed)
Histogram.MEAN (the average request processing time)
Var.Value (the amount of free memory)

Metric is a typed named entity measuring a certain set of Measurables,
and providing the results of the measurements.

Each metric has a type that supports a certain set of Measurables and
a certain way of measuring them.

Metric types are represented by subclasses of Metric.
Examples:

Counter is a java.lang.Long based counter,
measuring exactly one value - the current value of the counter (Counter.COUNT).
The measurement is performed by explicitly increasing or decreasing the counter:

Counter requestErrorCounter = ... // COUNT = 0
requestErrorCounter.inc();        // COUNT = 1
requestErrorCounter.inc(2);       // COUNT = 3
requestErrorCounter.dec(2);       // COUNT = 1

Timer measures the number of times something happens,
the speed at which it happens (rate), and the statistical distribution of its duration:
- Counter.COUNT (the number of times something happens)
- Rate.MEAN_RATE (the mean rate)
- Rate.ONE_MINUTE_RATE
- Rate.FIVE_MINUTES_RATE
- Rate.FIFTEEN_MINUTES_RATE
- Rate.RATE_UNIT
- Histogram.MIN (the min duration)
- Histogram.MAX (the max duration)
- Histogram.MEAN (the mean duration)
- Histogram.STANDARD_DEVIATION
- Histogram.Percentile (including the predefined PERCENTILE_5, PERCENTILE_10, ...)
- Timer.DURATION_UNIT
The measurement is performed by explicitly taking into account the durations:
```
// Histogram.MAX = 0.0
// Histogram.MEAN = 0.0
Timer requestTimer = ...

// Histogram.MAX = 10.0
// Histogram.MEAN = 10.0
requestTimer.update(10, NANOSECONDS);

// Histogram.MAX = 20.0
// Histogram.MEAN = 15.0
requestTimer.update(20, NANOSECONDS); 

// Histogram.MAX = 30.0
// Histogram.MEAN = 20.0
requestTimer.update(30, NANOSECONDS); 
```

Each metric has a name represented by the MetricName class.

Metric does not provide measurement results directly.
Instead, Metric provides the set of its instances (MetricInstance),
which, in turn, provide measurement results:

interface Metric extends Iterable<MetricInstance> {
    ...
}

interface MetricInstance {
    ...
    Set<Measurable> measurables();
    <V> V valueOf(Measurable measurable) throws NotMeasuredException;
}

This design has been driven by the need to support labeled metrics.

Labeled metric is a metric with which a set of attributes (labels) is associated,
and which generates a separate "child" metric (represented by a MetricInstance)
for each involved (for which there was at least one update) combination of values of these attributes.

Let us explain this definition with the following example 1,
that we will use and develop throughout this section.

Assume you are developing an HTTP service that, in order to do its job, calls several other HTTP services.
Each of the external services is represented by several instances running on specific servers and ports.
The service being developed can call any of these instances.
(for example, by using round-robin balancing).

Let us also assume that we would like to have a separate request execution Timer
for each instance of an external service (that is, for each combination of service, server, and port
for which at least one request has been made).

In other words, we would like to define a labeled Timer with the
labels service, server, port (taking into account the order of the labels).

Let us now return to the concept of a metric instance (MetricInstance), and consider it in more detail.

Instances are identified by a name (MetricName) which always starts with the metric's name (MetricName)
and a list of label values (LabelValue).
There is also the set of Measurables associated with an instance.
The instance allows you to get the value of any Measurable from this set (see instance.valueOf(m) below):

Counter counter = registry.counter(withName("counter"));

counter.inc();
counter.inc(2);
counter.dec();

counter.addListener(new MetricListener() {

    @Override
    public void metricInstanceAdded(MetricInstance instance) {
        List<String> labelValuesString = instance.labelValues().stream()
            .map(lv -> lv.label().name() + "=" + lv.value())
            .collect(toList());

        String valuesString = "{" + instance.measurables().stream()
            .map(m -> m.getClass().getSimpleName() + "=" + instance.valueOf(m))
            .collect(joining(", ")) + "}";
            
        // You can also use the following "snapshot-based" approach for getting values:
        // 
        // MeasurableValues values = instance.measurableValues();
        // values.valueOf(m);    
      
        System.out.println(
            "Metric instance added:\n"
            + "  name = '" + instance.name() + "',\n"
            + "  label values = " + instance.labelValues() + ",\n"
            + "  total instance = " + instance.isTotalInstance() + ",\n"
            + "  labeled metric total instance = " + instance.isLabeledMetricTotalInstance() + ",\n"
            + "  level instance = " + instance.isLevelInstance() + ",\n"
            + "  measurable values = " + valuesString);
    }

    @Override
    public void metricInstanceRemoved(MetricInstance instance) {
        System.out.println("Metric instance removed: name = '" + instance.name() + "'");
    }
});

Output:

Metric instance added:
  name = 'counter',
  label values = [],
  total instance = true,
  labeled metric total instance = false,
  level instance = false,
  measurable values = {Count=2}

The lifetime of some instances coincides with the lifetime of the metric,
while others can be added or removed when certain conditions are met.
Being Iterable<MetricInstance>, the metric provides the current set of instances (thread-safely).
This feature is used by most exporters (see below).
It's also possible to subscribe to the events metricInstanceAdded and metricInstanceRemoved.
This feature is used, for example, by JmxMetricsReporter.

An update of a metric is taken into account by the corresponding subset of instances.
By default, only total instance is created along with the metric
(it can be disabled through metric configuration).
The total instance takes all updates into account.
The lifetime of the total instance is the same as the lifetime of the metric.
This is the only instance created for non-labeled metrics.

If the metric is labeled, an instance is created for each combination of LabelValues.
Such an instance takes into account only those updates that are made for the corresponding combination:

Histogram histogram = registry.histogram(
    withName("histogram"),
    () -> withHistogram()
        .labels(SERVICE, SERVER, PORT)
        .allSlice().noLevels());

// updates the total instance and the instance service_1/server_1_1/111
histogram.update(10, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111"))); 

// updates the total instance and the instance service_1/server_1_12/121
histogram.update(20, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));

// updates the total instance and the instance service_2/server_2_1/211
histogram.update(30, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

// updates the total instance and the instance service_2/server_2_1/212
histogram.update(40, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("212")));

// instances are added asynchronously
Thread.sleep(25);

histogram.forEach(instance -> {
    List<String> labelValuesString = instance.labelValues().stream()
        .map(lv -> lv.label().name() + "=" + lv.value())
        .collect(toList());

    // "snapshot-based" approach for getting values
    MeasurableValues values = instance.measurableValues();

    String valuesString = "{" + instance.measurables().stream()
        // or instance.valueOf(m) but this approach is not "snapshot-based"
        // and should not be used unnecessarily
        .map(m -> {
            String name =
                m instanceof Histogram.Percentile ?
                "Percentile_" + ((Histogram.Percentile)m).quantileAsString() :
                m.getClass().getSimpleName();

            return name + "=" + values.valueOf(m);
        })
        .collect(joining(", ")) + "}";

    System.out.println(
        "Metric instance:\n"
        + "  name = '" + instance.name() + "',\n"
        + "  label values = " + labelValuesString + ",\n"
        + "  total instance = " + instance.isTotalInstance() + ",\n"
        + "  labeled metric total instance = " + instance.isLabeledMetricTotalInstance() + ",\n"
        + "  level instance = " + instance.isLevelInstance() + ",\n"
        + "  measurable values = " + valuesString);
});

Output:

Metric instance:
  name = 'histogram',
  label values = [],
  total instance = true,
  labeled metric total instance = true,
  level instance = false,
  measurable values = {Count=4, Max=40, Percentile_0.9=40.0, Percentile_0.99=40.0, Percentile_0.5=30.0, Mean=25.0, Min=10}
  
Metric instance:
  name = 'histogram',
  label values = [service=service_1, server=server_1_1, port=111],
  total instance = false,
  labeled metric total instance = false,
  level instance = false,
  measurable values = {Count=1, Max=10, Percentile_0.9=10.0, Percentile_0.99=10.0, Percentile_0.5=10.0, Mean=10.0, Min=10}
  
Metric instance:
  name = 'histogram',
  label values = [service=service_1, server=server_1_2, port=121],
  total instance = false,
  labeled metric total instance = false,
  level instance = false,
  measurable values = {Count=1, Max=20, Percentile_0.9=20.0, Percentile_0.99=20.0, Percentile_0.5=20.0, Mean=20.0, Min=20}  
  
Metric instance:
  name = 'histogram',
  label values = [service=service_2, server=server_2_1, port=211],
  total instance = false,
  labeled metric total instance = false,
  level instance = false,
  measurable values = {Count=1, Max=30, Percentile_0.9=30.0, Percentile_0.99=30.0, Percentile_0.5=30.0, Mean=30.0, Min=30}  
  
Metric instance:
  name = 'histogram',
  label values = [service=service_2, server=server_2_1, port=212],
  total instance = false,
  labeled metric total instance = false,
  level instance = false,
  measurable values = {Count=1, Max=40, Percentile_0.9=40.0, Percentile_0.99=40.0, Percentile_0.5=40.0, Mean=40.0, Min=4}

The life cycle of such an instance may end earlier the life cycle of the metric
in the case of an eviction or expiration (see Labeled metrics),
as well as in the case of deregistering a list of label values for a variable (see Var and Caching Var).

To manage metrics, a special entity is used - metric registry.

Metric registry allows you to add and remove metrics, get the current set of metrics, and
subscribe to an event of adding/removing a metric.

A metric registry is represented by the MetricRegistry class.

MetricRegistry implementations provide metric implementations.
Currently, the following implementations are supported:

DefaultMetricRegistry
DropwizardMetricRegistry based on https://metrics.dropwizard.io
In the next versions, we plan to abandon this implementation in favor of DefaultMetricRegistry

Let us move on to practice and show how to implement the requirements from example 1
using Metrics Facade. First, you need to add a number of dependencies:

Base (Core):

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-base</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

MetricRegistry implementation (for example, DefaultMetricRegistry):

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-default-impl</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

Metrics reporter(s) (for example, PrometheusMetricsExporter):

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-prometheus</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

Let's add metrics.
At the moment, we have only one metric - a labeled Timer for requests to
external services with the labels service, server, and port.

A label is represented by the Label class.
Let's define the labels:

Label SERVICE = new Label("service");
Label SERVER = new Label("server");
Label PORT = new Label("port");

and add the metric usingMetricRegistry:

MetricRegistry registry = new DefaultMetricRegistry();  
...

Timer httpClientRequestTimer = registry.timer(
    withName("http", "client", "request", "duration"),
    () -> withTimer().labels(SERVICE, SERVER, PORT));

The next step is to update the metric.
To do this, after executing a request to an external service,
it is necessary to record the time of its execution:

// in nanos
long requestDuration = ...

httpClientRequestTimer.update(
    requestDuration, 
    forLabelValues(SERVICE.value("authorizationService"), SERVER.value("127.0.0.1"), PORT.value("7001"))))

You can also use the following recording scheme:

// start a stopwatch before executing the request    
Stopwatch stopwatch = fullConfigTimer.stopwatch(forLabelValues(
    SERVICE.value("authorizationService"), 
    SERVER.value("127.0.0.1"), 
    PORT.value("7001"))));
...
        
// stop the stopwatch after executing the request    
stopwatch.stop()

Here the question may arise, what is measured?
By default, Timer measures the following values:

Counter.COUNT
Rate.MEAN_RATE
Rate.ONE_MINUTE_RATE
Rate.FIVE_MINUTES_RATE
Rate.FIFTEEN_MINUTES_RATE
Rate.RATE_UNIT
Histogram.MIN
Histogram.MAX
Histogram.MEAN
Histogram.PERCENTILE_50
Histogram.PERCENTILE_90
Histogram.PERCENTILE_99
Timer.DURATION_UNIT

However, Metrics Facade allows for extremely flexible configuration of metrics.
In particular, for each metric you can specify a set of values to be calculated and exported.

Suppose, for example, that you have multiple timers in your project, and for most of them
(and the number of timers in an average project can reach several dozen)
you want to measure Counter.COUNT, Histogram.MAX, Histogram.MEAN, Histogram.PERCENTILE_90,
but for the timer for processing incoming HTTP requests you want to also measure
Histogram.MIN, Histogram.PERCENTILE_75, Histogram.PERCENTILE_99.

This can be achieved as follows:

Set the required default values for timers:

registry.preConfigure(allMetrics(), modifying().timer(withTimer().measurables(
    COUNT,
    MAX, 
    MEAN, 
    PERCENTILE_90)));

Set the required set of values when defining the timer for processing
incoming HTTP requests (the defaults will be overridden):

Timer httpRequestTimer = registry.timer(
   withName("http", "request", "duration"),
   () -> withTimer().measurables(
       COUNT,
       MIN,
       MAX,
       MEAN,
       PERCENTILE_75,
       PERCENTILE_90,
       PERCENTILE_99)

or set the required default values for Timer with the appropriate name:

registry.preConfigure(
    // there may also be other predicates (for example, metricsMatchingNameMask ("a.**.b")),
    // with which you can effectively configure any subset of metrics.
    metricWithName("http", "request", "duration"), 
    modifying().timer(withTimer().measurables(
        COUNT,
        MIN,
        MAX,
        MEAN,
        PERCENTILE_75,
        PERCENTILE_90,
        PERCENTILE_99)));

This configuration scheme allows:

Set the configuration in one place without having to change the code of defining each metric involved.
Optimize calculations and export.
For example, the implementation of a metric is not required to perform an expensive operation of
calculating percentiles, if percentiles are not included in the set of values defined for this metric.
We plan to provide such implementations in one of the next major versions.
Add (in future versions) support for new measurable values without breaking backward compatibility.
Unlike most other libraries for working with metrics, the Metrics Facade interfaces do not provide
special methods for getting values of certain measured values (e.g. double Histogram.mean()).
Instead, methods are used that take Measurable as a parameter:
MetricInstance.valueOf(Measurable) or
MetricInstance.measurableValues().valueOf(Measurable) (a snapshot-based approach).
This design allows the library to remain extremely flexible in terms of supporting new features.

Let's add a counter of active client connections (sessions) to our project:

Counter activeClientConnectionCounter = registry.counter(withName("active", "client", "connections"));

Now we have two metrics.
These metrics live in memory, measuring certain values.
However, metrics are usually used to provide runtime information about an application instance.
In order to do this, it is necessary not only to collect metrics but also to export them to
external monitoring systems that provide various means of visualizing and analyzing the collected metrics.
For example, Prometheus (https://prometheus.io).

To export metrics, metrics reporters are used (Metrics Reporters).

Metrics reporter allows you to present metrics in the format of a specific monitoring system;
some reporters additionally send metrics to an external monitoring system in the appropriate format.

Let's add reporters for Prometheus and JMX:

PrometheusMetricsExporter prometheusExporter = new PrometheusMetricsExporter(registry);
new PrometheusHttpServer(9095, prometheusExporter); // This server is for tests only

registry.addListener(new JmxMetricsReporter());

Let's assume that the following metric updates have been made:

httpClientRequestTimer.update(
    100L, MILLISECONDS,
    forLabelValues(SERVICE.value("authorizationService"), SERVER.value("127.0.0.1"), PORT.value("7001")));

httpClientRequestTimer.update(
    200L, MILLISECONDS,
    forLabelValues(SERVICE.value("authorizationService"), SERVER.value("127.0.0.2"), PORT.value("7002")));

// start a stopwatch before executing the request
Stopwatch stopwatch = httpClientRequestTimer.stopwatch(forLabelValues(
    SERVICE.value("throttlingService"),
    SERVER.value("127.0.0.3"),
    PORT.value("7003")));

sleep(300L);

// stop the stopwatch after executing the request
stopwatch.stop();

activeClientConnectionCounter.inc();
activeClientConnectionCounter.inc();
activeClientConnectionCounter.inc();
activeClientConnectionCounter.dec();

Let's take a look at (slightly modified and formatted for better readability)
the response of the test Prometheus server (http://localhost:9095/metrics):

# TYPE http_client_request_duration summary
http_client_request_duration_count{service="authorizationService",server="127.0.0.1",port="7001",} 1.0
http_client_request_duration{service="authorizationService",server="127.0.0.1",port="7001",quantile="0.9",} 100.0
http_client_request_duration{service="authorizationService",server="127.0.0.1",port="7001",quantile="0.99",} 100.0
http_client_request_duration{service="authorizationService",server="127.0.0.1",port="7001",quantile="0.5",} 100.0
        
http_client_request_duration_count{service="authorizationService",server="127.0.0.2",port="7002",} 1.0
http_client_request_duration{service="authorizationService",server="127.0.0.2",port="7002",quantile="0.9",} 200.0
http_client_request_duration{service="authorizationService",server="127.0.0.2",port="7002",quantile="0.99",} 200.0
http_client_request_duration{service="authorizationService",server="127.0.0.2",port="7002",quantile="0.5",} 200.0
        
http_client_request_duration_count{service="throttlingService",server="127.0.0.3",port="7003",} 1.0
http_client_request_duration{service="throttlingService",server="127.0.0.3",port="7003",quantile="0.9",} 300.692665
http_client_request_duration{service="throttlingService",server="127.0.0.3",port="7003",quantile="0.99",} 300.692665
http_client_request_duration{service="throttlingService",server="127.0.0.3",port="7003",quantile="0.5",} 300.692665
        
# TYPE http_client_request_duration_max gauge
http_client_request_duration_max{service="authorizationService",server="127.0.0.1",port="7001",} 100.0
http_client_request_duration_max{service="authorizationService",server="127.0.0.2",port="7002",} 200.0
http_client_request_duration_max{service="throttlingService",server="127.0.0.3",port="7003",} 300.692665        
        
# TYPE http_client_request_duration_mean gauge
http_client_request_duration_mean{service="authorizationService",server="127.0.0.1",port="7001",} 100.0
http_client_request_duration_mean{service="authorizationService",server="127.0.0.2",port="7002",} 200.0
http_client_request_duration_mean{service="throttlingService",server="127.0.0.3",port="7003",} 300.692665
        
# TYPE active_client_connections gauge
active_client_connections 2.0

Now let's assume that the servers and ports for external services are not statically configured,
and are periodically requested from a special external service named "discoveryService".
Let's also assume that we would not want the duration of requests to this "auxiliary" service
to affect the statistics for other services ("business" services).

This can be achieved through exclusion of label values matching the corresponding predicate:

Timer httpClientRequestTimer = registry.timer(
    withName("http", "client", "request", "duration"),
    () -> withTimer()
        .labels(SERVICE, SERVER, PORT)
        .exclude(labelValuesMatchingAll(SERVICE.mask("discoveryService"))));

The following metric update will be ignored:

httpClientRequestTimer.update(
    100L, MILLISECONDS,
    forLabelValues(SERVICE.value("discoveryService"), SERVER.value("127.0.0.4"), PORT.value("7004")));

Further, since services are discovered dynamically,
the set of their server-port combinations can change over time.
The set of MetricInstances of the timer will, accordingly, grow indefinitely.
Some of the MetricInstances will represent non-existent servers, and because
they have already been exported to external monitoring systems, they are no longer needed,
but at the same time they continue to consume resources (for example, memory).

To solve this problem, Metrics Facade allows you to set for each metric:

the maximum number of combinations of label values:
when this threshold is exceeded, the combination that has not been updated for the longest time,
will be automatically removed
the expiration time for a combination of label values:
if a combination has not been updated during this time, it will be automatically removed

For example:

Timer httpClientRequestTimer = registry.timer(
    withName("http", "client", "request", "duration"),
    () -> withTimer()
        .labels(SERVICE, SERVER, PORT)
        .exclude(labelValuesMatchingAll(SERVICE.mask("discoveryService")))
        .maxLabeledInstancesPerSlice(100) // eviction
        .expireLabeledInstanceAfter(30, SECONDS)); // expiration

Further, it would often be helpful to have:

a total MetricInstance for each service
(taking all updates for this service into account)
a total MetricInstance for each service-server combination
a total MetricInstance for each server.
Let's also assume that for such an instance, it is sufficient to measure only
MAX, MEAN, and PERCENTILE_99,
and it is necessary to consider only updates for services whose name either starts with "auth" or
contains the "throttling" substring, but at the same time exclude updates for port 7004

Metrics Facade offers the ability to automatically add this kind of
MetricInstances through the functionality of slices and levels.

Slice is a child metric that takes into account only those updates of the parent metric
which satisfy the given LabelValuesPredicate.
A slice can have its own configuration: name suffix, labels
(MUST be sublist if the parent metric's labels), measurable values, etc.

By default, a metric has only one slice - AllSlice taking all updates into account.

Slice level is a set of MetricInstances of the slice
for the first k labels, k = 1..<label_count> - 1.

For example, if for a slice:

three labels are defined: SERVICE, SERVER, PORT
levels are enabled

then when updating this slice for the values
SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111"),
the following MetricInstances will be created/updated:

SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")
SERVICE.value("service_1"), SERVER.value("server_1_1") level instance
SERVICE.value("service_1") level instance

By default, levels are enabled for AllSlice and disabled for other slices.
Here is how you can implement the above requirements using the functionality of slices and levels:

Timer httpClientRequestTimer = registry.timer(
    withName("http", "client", "request", "duration"),
    () -> withTimer()
        .labels(SERVICE, SERVER, PORT)
        ...
        .allSlice()
            .enableLevels() // enabled by default for AllSlice; implements 1) and 2)
        .slice("by", "server") // implements 3) 
            .predicate(labelValuesMatchingAll(
                SERVICE.mask("auth*|*throttling*"),
                PORT.predicate(p -> !p.equals("7004"))))
            .labels(SERVER)
            .measurables(MAX, MEAN, PERCENTILE_99));

You can find the complete sample GettingStartedSample.java in the following Maven module:

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-samples</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

Features

Flexible Configuration

You can set up defaults/overrides for any subset of metrics
(by using a predicate) and almost any metric parameters.
In particular, you can specify a set of values to be calculated and exported.
For example, the implementation of a metric is not required to perform an expensive operation of
calculating percentiles, if percentiles are not included in the set of values defined for this metric.

Example of setting up defaults (DefaultsSample.java):

registry.preConfigure(allMetrics(), modifying()
    .metric(withMetric().prefix(labelValues(SAMPLE.value("defaults"))))
    .meter(withMeter()
        .expireLabeledInstanceAfter(30, MINUTES)
        .allSlice().noLevels())
    .rate(withRate().measurables(COUNT, ONE_MINUTE_RATE))
    .histogram(withHistogram().measurables(COUNT, MAX, MEAN, PERCENTILE_95))
    .timer(withTimer().measurables(COUNT, ONE_MINUTE_RATE, MAX, MEAN, PERCENTILE_95)));

Example of setting up overrides (OverridesSample.java):

registry.postConfigure(allMetrics(), modifying()
    .metric(withMetric().prefix(labelValues(SAMPLE.value("overrides"))))
    .meter(withMeter()
        .expireLabeledInstanceAfter(30, MINUTES)
        .allSlice().noLevels())
    .rate(withRate().measurables(COUNT, ONE_MINUTE_RATE))
    .histogram(withHistogram().measurables(COUNT, MAX, MEAN, PERCENTILE_95))
    .timer(withTimer().measurables(COUNT, ONE_MINUTE_RATE, MAX, MEAN, PERCENTILE_95)));

...for any subset of metrics and almost any parameters:

registry.postConfigure(metricWithName("a.b.c"), modifying().metric(withMetric().disable()));
registry.postConfigure(metricsWithNamePrefix("a.b"), modifying().histogram(withHistogram().enable()));
registry.postConfigure(metricsMatchingNameMask("a.**.b"), modifying().histogram(withHistogram().enable()));

registry.postConfigure(
    metrics()
        .including(metricsMatchingNameMask("a.b.**.d.**")).excluding(metricWithName("a.b.c.d"))
        .including(metricsWithNamePrefix("d.e.f")).excluding(metricsWithNamePrefix("d.e.f.g")),
    modifying().meter(withMeter().disable()));

assert registry.timer(withName("a", "b", "c", "d")).isEnabled();
assert !registry.timer(withName("a", "b", "c", "d", "e")).isEnabled(); // disabled
assert !registry.timer(withName("a", "b", "x", "d")).isEnabled(); // disabled
assert !registry.timer(withName("d", "e", "f")).isEnabled(); // disabled
assert registry.timer(withName("d", "e", "f", "g")).isEnabled();
assert registry.timer(withName("d", "e", "f", "g", "h")).isEnabled();

See Getting Started, DefaultsSample.java,
and OverridesSample.java for more details and usage examples.

Labeled metrics

In particular, Metrics Facade supports eviction and expiration for labeled metrics,
that is, for each labeled metric, it allows you to specify:

the maximum number of combinations of label values:
when this threshold is exceeded, the combination that has not been updated for the longest time,
will be automatically removed (will not waste system resources)
the expiration time for a combination of label values:
if a combination has not been updated during this time, it will be automatically removed

LabeledMetricsEvictionAndExpirationSample.java

registry.histogram(
    withName("ActiveHealthChecker", "healthCheck", "attemptCount", "histogram"),
    () -> withHistogram()
        .labels(SERVICE, SERVER, PORT)
        .maxLabeledInstancesPerSlice(5) // eviction
        .expireLabeledInstanceAfter(1, MINUTES)); // expiration

See Getting Started and LabeledMetricsEvictionAndExpirationSample.java
for more details and usage examples.

Prefix Label Values

registry.postConfigure(
    metricsMatchingNameMask("ActiveHealthChecker.**"),
    modifying().metric(withMetric().prefix(labelValues(SAMPLE.value("prefixLabelValues")))));
    
registry.histogram(
    withName("ActiveHealthChecker", "healthCheck", "attemptCount", "histogram"),
    () -> withHistogram().labels(SERVICE, SERVER, PORT));

h.update(25, forLabelValues(
    SERVICE.value("service_1"), 
    SERVER.value("server_1_1"), 
    PORT.value("7001")));

is the same as

registry.histogram(
    withName("ActiveHealthChecker", "healthCheck", "attemptCount", "histogram"),
    () -> withHistogram().labels(SAMPLE, SERVICE, SERVER, PORT));

h.update(25, forLabelValues(
    SAMPLE.value("prefixLabelValues"),
    SERVICE.value("service_1"), 
    SERVER.value("server_1_1"), 
    PORT.value("7001")));

See PrefixLabelValuesSample.java for more details and usage examples.

Exclusions

You can drop the metric updates for label values matching a predicate:

ExclusionsSample.java

registry.timer(
    withName("ActiveHealthChecker", "healthCheck"),
    () -> withTimer()
        .labels(SERVICE, SERVER, PORT)
        .exclude(labelValuesMatchingAny(
            SERVER.mask("server_1_*|*2_1*"),
            PORT.predicate(p -> p.equals("9001")))));

See Getting Started and ExclusionsSample.java for more details and usage examples.

Slices and Levels

By default, a metric has only one slice - AllSlice taking all updates into account.

Slice level is a set of MetricInstances of the slice
for the first k labels, k = 1..<label_count> - 1.

For example, if for a slice:

three labels are defined: SERVICE, SERVER, PORT
levels are enabled

then when updating this slice for the values
SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111"),
the following MetricInstances will be created/updated:

SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")
SERVICE.value("service_1"), SERVER.value("server_1_1") level instance
SERVICE.value("service_1") level instance

By default, levels are enabled for AllSlice and disabled for other slices.
See Getting Started and SlicesAndLevelsSample.java for more details and usage examples.

Metrics

Counter

c.r.p.metrics.counter.Counter is a java.lang.Long based counter that can be increased or decreased:

CounterSample.java

// Default config:
//   no labels
//   measurables: { COUNT }
Counter defaultConfigCounter = registry.counter(withName("counter", "defaultConfig"));

defaultConfigCounter.inc();
defaultConfigCounter.inc(2);
defaultConfigCounter.dec();

Supported measurables:

Counter.COUNT

Counter Config

CounterSample.java

Counter fullConfigCounter = registry.counter(
    withName("counter", "fullConfig"),
    () -> withCounter()
        // options: disable(), enabled(boolean)
        // default: enabled
        .enable()

        // default: no prefix label values
        .prefix(labelValues(SAMPLE.value("counter")))

        // default: no labels
        .labels(SERVICE, SERVER, PORT)

        // options: noExclusions()
        // default: no exclusions
        .exclude(labelValuesMatchingAny(
            SERVICE.mask("serv*2|serv*4*"),
            SERVER.mask("server_5")))

        // default: unlimited
        .maxLabeledInstancesPerSlice(5)

        // options: notExpireLabeledInstances()
        // default: no expiration
        .expireLabeledInstanceAfter(25, SECONDS)

        // options: noMeasurables()
        // default: { COUNT }
        .measurables(COUNT)

        // the properties specific to the metrics implementation
        // default: no properties
        .put("key_1", "value_1_1")

        .allSlice()
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()

            // default: the metric's labels [ SERVICE, SERVER, PORT ]
            .labels(SERVICE, SERVER)

            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(10)

            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)

            // options: noMeasurables() 
            // default: the metric's measurables { COUNT }
            .measurables(COUNT)

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()

            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: enabled
            .enableLevels()

            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")

            .total(counterInstance()
                // default: empty name suffix
                .name("total")

                // options: noMeasurables()
                // default: the slice's measurables { COUNT }
                .measurables(COUNT)

                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
        .slice("byService")
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()

            // default: no predicate
            .predicate(labelValuesMatchingAll(
                SERVICE.mask("serv*_1*"),
                SERVER.predicate(s -> s.equals("server_1_1"))))

            // default: no labels
            .labels(SERVICE)

            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(2)

            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)

            // options: noMeasurables()
            // default: the metric's measurables { COUNT }
            .measurables(COUNT)

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()

            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: disabled
            .enableLevels()

            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")

            .total(counterInstance()
                // default: empty name suffix
                .name("total")

                // options: noMeasurables()
                // default: the slice's measurables { COUNT }
                .measurables(COUNT)

                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
);

Rate

c.r.p.metrics.rate.Rate measures the number of times something happens (Counter.COUNT)
and the speed at which it happens (for example, the number of requests per Rate.RATE_UNIT).

Supported measurables:

Counter.COUNT
Rate.MEAN_RATE
Rate.ONE_MINUTE_RATE
Rate.FIVE_MINUTES_RATE
Rate.FIFTEEN_MINUTES_RATE
Rate.RATE_UNIT

RateSample.java

// Default config:
//   no labels
//   measurables: {
//     COUNT,
//     MEAN_RATE,
//     ONE_MINUTE_RATE,
//     FIVE_MINUTES_RATE,
//     FIFTEEN_MINUTES_RATE,
//     RATE_UNIT
//   }
Rate defaultConfigRate = registry.rate(withName("rate", "defaultConfig"));

defaultConfigRate.mark();
defaultConfigRate.mark(2);

Rate Config

RateSample.java

Rate fullConfigRate = registry.rate(
    withName("rate", "fullConfig"),
    () -> withRate()
        // options: disable(), enabled(boolean)
        // default: enabled
        .enable()

        // default: no prefix label values
        .prefix(labelValues(SAMPLE.value("rate")))

        // default: no labels
        .labels(SERVICE, SERVER, PORT)

        // options: noExclusions()
        // default: no exclusions
        .exclude(labelValuesMatchingAny(
            SERVICE.mask("serv*2|serv*4*"),
            SERVER.mask("server_5")))

        // default: unlimited
        .maxLabeledInstancesPerSlice(5)

        // options: notExpireLabeledInstances()
        // default: no expiration
        .expireLabeledInstanceAfter(25, SECONDS)

        // options: noMeasurables()
        // default: {
        //   COUNT,
        //   MEAN_RATE,
        //   ONE_MINUTE_RATE,
        //   FIVE_MINUTES_RATE,
        //   FIFTEEN_MINUTES_RATE,
        //   RATE_UNIT
        // }
        .measurables(COUNT)
        
        /**
         * options:
         *   - expMovingAverage() == {@link ExpMovingAverageRateImplConfigBuilder#expMovingAverage()},
         *   - custom impl, e.g. countAndMean() == {@link CountScalingRateConfigBuilder#countScaling()}.
         *     Custom impls must be registered: registry.extendWith(new CountScalingRateConfigBuilder()).
         * default: expMovingAverage()
         */
        .impl(expMovingAverage())
        // .impl(countScaling().factor(2)) // custom impl

        // the properties specific to the metrics implementation
        // default: no properties
        .put("key_1", "value_1_1")

        .allSlice()
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()

            // default: the metric's labels [ SERVICE, SERVER, PORT ]
            .labels(SERVICE, SERVER)

            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(10)

            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)

            // options: noMeasurables()
            // default: the metric's measurables { COUNT }
            .measurables(COUNT, MEAN_RATE)

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()

            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: enabled
            .enableLevels()

            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")

            .total(rateInstance()
                // default: empty name suffix
                .name("total")

                // options: noMeasurables()
                // default: the slice's measurables { COUNT, MEAN_RATE }
                .measurables(
                    COUNT,
                    MEAN_RATE, ONE_MINUTE_RATE, FIVE_MINUTES_RATE,
                    RATE_UNIT)

                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
        .slice("byService")
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()

            // default: no predicate
            .predicate(labelValuesMatchingAll(
                SERVICE.mask("serv*_1*"),
                SERVER.predicate(s -> s.equals("server_1_1"))))

            // default: no labels
            .labels(SERVICE)

            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(2)

            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)

            // options: noMeasurables()
            // default: the metric's measurables { COUNT }
            .measurables(COUNT, ONE_MINUTE_RATE)

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()

            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: disabled
            .enableLevels()

            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")

            .total(rateInstance()
                // default: empty name suffix
                .name("total")

                // options: noMeasurables()
                // default: the slice's measurables { COUNT, ONE_MINUTE_RATE }
                .measurables(
                    COUNT,
                    MEAN_RATE, ONE_MINUTE_RATE, FIVE_MINUTES_RATE,
                    RATE_UNIT)

                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
);

Histogram

c.r.p.metrics.histogram.Histogram measures the statistical distribution of java.lang.Long values.

HistogramSample.java

// Default config:
//   no labels
//   measurables: {
//     COUNT,
//     MIN,
//     MAX,
//     MEAN,
//     PERCENTILE_50,
//     PERCENTILE_90,
//     PERCENTILE_99
//   }
Histogram defaultConfigHistogram = registry.histogram(withName("histogram", "defaultConfig"));

defaultConfigHistogram.update(1L);
defaultConfigHistogram.update(2L);

Supported measurables:

Counter.COUNT
Histogram.MIN
Histogram.MAX
Histogram.MEAN
Histogram.STANDARD_DEVIATION
Histogram.Percentile (including the predefined Histogram.PERCENTILE_5, Histogram.PERCENTILE_10, ...)
Histogram.Bucket (including the predefined Histogram.MS_10_BUCKET, Histogram.SEC_1_BUCKET, ...)

Histogram Config

HistogramSample.java

Histogram fullConfigHistogram = registry.histogram(
    withName("histogram", "fullConfig"),
    () -> withHistogram()
        // options: disable(), enabled(boolean)
        // default: enabled
        .enable()

        // default: no prefix label values
        .prefix(labelValues(SAMPLE.value("histogram")))

        // default: no labels
        .labels(SERVICE, SERVER, PORT)

        // options: noExclusions()
        // default: no exclusions
        .exclude(labelValuesMatchingAny(
            SERVICE.mask("serv*2|serv*4*"),
            SERVER.mask("server_5")))

        // default: unlimited
        .maxLabeledInstancesPerSlice(5)

        // options: notExpireLabeledInstances()
        // default: no expiration
        .expireLabeledInstanceAfter(25, SECONDS)

        // options: noMeasurables()
        // default: {
        //   COUNT,
        //   MIN,
        //   MAX,
        //   MEAN,
        //   PERCENTILE_50,
        //   PERCENTILE_90,
        //   PERCENTILE_99
        // }
        .measurables(COUNT, MEAN)

        // the properties specific to the metrics implementation
        // default: no properties
        .put("key_1", "value_1_1")

        /**
         * options:
         *   - hdr() == {@link HdrHistogramImplConfigBuilder#hdr()},
         *   - scale() == {@link com.ringcentral.platform.metrics.defaultImpl.histogram.scale.configs.ScaleHistogramImplConfigBuilder#scale()},
         *   - custom impl, e.g. countAndTotalSum() == {@link CountAndTotalSumScalingHistogramConfigBuilder#countAndTotalSumScaling()}.
         *     Custom impls must be registered: registry.extendWith(new CountAndTotalSumScalingHistogramConfigBuilder()).
         * default: hdr()
         */
        .impl(hdr()
            .resetByChunks(6, Duration.ofMinutes(2))
            .highestTrackableValue(1000, REDUCE_TO_HIGHEST_TRACKABLE)
            .significantDigits(3)
            .snapshotTtl(30, SECONDS))
        // .impl(countAndTotalSumScaling().factor(2)) // custom impl

        .allSlice()
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()

            // default: the metric's labels [ SERVICE, SERVER, PORT ]
            .labels(SERVICE, SERVER)

            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(10)

            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)

            // options: noMeasurables()
            // default: the metric's measurables { COUNT, MEAN }
            .measurables(COUNT, MEAN, MAX)

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()

            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: enabled
            .enableLevels()

            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")

            .total(histogramInstance()
                // default: empty name suffix
                .name("total")

                // options: noMeasurables()
                // default: the slice's measurables { COUNT, MEAN, MAX }
                .measurables(COUNT, MEAN, PERCENTILE_95, MAX, Bucket.of(1), Bucket.of(2))

                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
        .slice("byService")
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()

            // default: no predicate
            .predicate(labelValuesMatchingAll(
                SERVICE.mask("serv*_1*"),
                SERVER.predicate(s -> s.equals("server_1_1"))))

            // default: no labels
            .labels(SERVICE)

            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(2)

            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)

            // options: noMeasurables()
            // default: the metric's measurables { COUNT, MEAN }
            .measurables(
                COUNT,
                TOTAL_SUM,
                MEAN,
                PERCENTILE_50,
                PERCENTILE_95,
                MAX,
                Buckets.of(points(0, 1, 24, 25, 30, 49, 50, 55)))

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()

            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: disabled
            .enableLevels()

            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")

            .total(histogramInstance()
                // default: empty name suffix
                .name("total")

                // options: noMeasurables()
                // default: the slice's measurables { COUNT, MEAN, PERCENTILE_50, PERCENTILE_95, MAX }
                .measurables(COUNT, MIN, MEAN, MAX)

                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
);

Timer

c.r.p.metrics.timer.Timer measures the number of times something happens,
the speed at which it happens (rate), and the statistical distribution of its duration.
In fact, it combines Rate and Histogram.

Supported measurables:

Counter.COUNT
Rate.MEAN_RATE
Rate.ONE_MINUTE_RATE
Rate.FIVE_MINUTES_RATE
Rate.FIFTEEN_MINUTES_RATE
Rate.RATE_UNIT
Histogram.MIN
Histogram.MAX
Histogram.MEAN
Histogram.STANDARD_DEVIATION
Histogram.Percentile (including the predefined Histogram.PERCENTILE_5, Histogram.PERCENTILE_10, ...)
Histogram.Bucket (including the predefined Histogram.MS_10_BUCKET, Histogram.SEC_1_BUCKET, ...)
Timer.DURATION_UNIT

Timer Config

TimerSample.java

Timer fullConfigTimer = registry.timer(
    withName("timer", "fullConfig"),
    () -> withTimer()
        // options: disable(), enabled(boolean)
        // default: enabled
        .enable()

        // default: no prefix label values
        .prefix(labelValues(SAMPLE.value("timer")))
    
        // default: no labels
        .labels(SERVICE, SERVER, PORT)
    
        // options: noExclusions()
        // default: no exclusions
        .exclude(labelValuesMatchingAny(
            SERVICE.mask("serv*2|serv*4*"),
            SERVER.mask("server_5")))
    
        // default: unlimited
        .maxLabeledInstancesPerSlice(5)
    
        // options: notExpireLabeledInstances()
        // default: no expiration
        .expireLabeledInstanceAfter(25, SECONDS)
    
        // options: noMeasurables()
        // default: {
        //   Counter.COUNT,
        //
        //   Rate.MEAN_RATE,
        //   Rate.ONE_MINUTE_RATE,
        //   Rate.FIVE_MINUTES_RATE,
        //   Rate.FIFTEEN_MINUTES_RATE,
        //   Rate.RATE_UNIT,
        //
        //   Histogram.MIN,
        //   Histogram.MAX,
        //   Histogram.MEAN,
        //   Histogram.PERCENTILE_50,
        //   Histogram.PERCENTILE_90,
        //   Histogram.PERCENTILE_99,
        //
        //   Timer.DURATION_UNIT
        // }
        .measurables(COUNT, MEAN_RATE, MAX, MEAN)
    
        // the properties specific to the metrics implementation
        // default: no properties
        .put("key_1", "value_1_1")

        /**
         * options:
         *   - expMovingAverage() == {@link ExpMovingAverageRateImplConfigBuilder#expMovingAverage()},
         *   - custom impl, e.g. countAndMean() == {@link CountScalingRateConfigBuilder#countScaling()}.
         *     Custom impls must be registered: registry.extendWith(new CountScalingRateConfigBuilder()).
         * default: expMovingAverage()
         */
        .impl(expMovingAverage())
        // .impl(countScaling().factor(2)) // custom impl

        /**
         * options:
         *   - hdr() == {@link HdrHistogramImplConfigBuilder#hdr()},
         *   - scale() == {@link com.ringcentral.platform.metrics.defaultImpl.histogram.scale.configs.ScaleHistogramImplConfigBuilder#scale()},
         *   - custom impl, e.g. countAndTotalSum() == {@link CountAndTotalSumScalingHistogramConfigBuilder#countAndTotalSumScaling()}.
         *     Custom impls must be registered: registry.extendWith(new CountAndTotalSumScalingHistogramConfigBuilder()).
         * default: hdr()
         */
        .impl(hdr()
            .resetByChunks(6, Duration.ofMinutes(2))
            .lowestDiscernibleValue(MILLISECONDS.toNanos(1))
            .highestTrackableValue(DAYS.toNanos(7), REDUCE_TO_HIGHEST_TRACKABLE)
            .significantDigits(2)
            .snapshotTtl(30, SECONDS))
        // .impl(countAndTotalSumScaling().factor(2)) // custom impl

        .allSlice()
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()
    
            // default: the metric's labels [ SERVICE, SERVER, PORT ]
            .labels(SERVICE, SERVER)
    
            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(10)
    
            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)
    
            // options: noMeasurables()
            // default: the metric's measurables { COUNT, MEAN_RATE, MAX, MEAN }
            .measurables(COUNT, MEAN_RATE, MAX, MEAN, PERCENTILE_50)
    
            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()
    
            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: enabled
            .enableLevels()
    
            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")
    
            .total(timerInstance()
                // default: empty name suffix
                .name("total")
    
                // options: noMeasurables()
                // default: the slice's measurables { COUNT, MEAN_RATE, MAX, MEAN, PERCENTILE_50 }
                .measurables(COUNT, MEAN_RATE, MAX, MEAN, PERCENTILE_50, PERCENTILE_90)
    
                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
        .slice("byService")
            // options: disable(), enabled(boolean)
            // default: enabled
            .enable()
    
            // default: no predicate
            .predicate(labelValuesMatchingAll(
                SERVICE.mask("serv*_1*"),
                SERVER.predicate(s -> s.equals("server_1_1"))))
    
            // default: no labels
            .labels(SERVICE)
    
            // options: noMaxLabeledInstances()
            // default: the metric's maxLabeledInstancesPerSlice = 5
            .maxLabeledInstances(2)
    
            // options: notExpireLabeledInstances()
            // default: the metric's expireLabeledInstanceAfter = 25 SECONDS
            .expireLabeledInstanceAfter(42, SECONDS)
    
            // options: noMeasurables()
            // default: the metric's measurables { COUNT, MEAN_RATE, MAX, MEAN }
            .measurables(
                COUNT,
                MEAN_RATE,
                MAX,
                MEAN,
                PERCENTILE_75,
                MS_10_BUCKET,
                MS_30_BUCKET,
                MS_50_BUCKET,
                MS_75_BUCKET,
                MS_100_BUCKET,
                MS_250_BUCKET)

            // options: disableTotal(), noTotal(), totalEnabled(boolean)
            // default: enabled
            .enableTotal()
    
            // options: disableLevels(), noLevels(), levelsEnabled(boolean)
            // default: disabled
            .enableLevels()
    
            // the properties specific to the metrics implementation
            // default: no properties (no overrides)
            .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_1"
            .put("key_2", "value_2_1")
    
            .total(timerInstance()
                // default: empty name suffix
                .name("total")
    
                // options: noMeasurables()
                // default: the slice's measurables { COUNT, MEAN_RATE, MAX, MEAN, PERCENTILE_75 }
                .measurables(COUNT, MEAN_RATE, MIN, MAX, MEAN, PERCENTILE_75, PERCENTILE_90)
    
                // the properties specific to the metrics implementation
                // default: no properties (no overrides)
                .put("key_1", "value_1_2") // overrides "key_1" -> "value_1_2"
                .put("key_2", "value_2_2")) // overrides "key_2" -> "value_2_1"
);

Var and CachingVar

c.r.p.metrics.var.Var is an arbitrary (possibly cached) value of a specific type
([Caching]ObjectVar, [Caching]LongVar, [Caching]DoubleVar, [Caching]StringVar are supported)
that can change over time.

VarSample.java

AtomicLong valueSupplier_1 = new AtomicLong();

// Supported var types: ObjectVar, LongVar, DoubleVar, StringVar
LongVar defaultConfigVar = registry.longVar(
    withName("var", "defaultConfig"),
    () -> valueSupplier_1.incrementAndGet());

AtomicLong valueSupplier_2 = new AtomicLong();

LongVar fullConfigVar = registry.longVar(
    withName("var", "fullConfig"),

    // options: Var.noTotal()
    () -> valueSupplier_2.incrementAndGet(),

    () -> withLongVar()
        // options: disable(), enabled(boolean)
        // default: enabled
        .enable()

        // default: no prefix label values
        .prefix(labelValues(SAMPLE.value("var")))

        .labels(SERVICE, SERVER, PORT)

        // the properties specific to the metrics implementation
        // default: no properties
        .put("key", "value"));

AtomicLong valueSupplier_3 = new AtomicLong();

fullConfigVar.register(
    () -> valueSupplier_3.incrementAndGet(),
    forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));

AtomicLong valueSupplier_4 = new AtomicLong();

fullConfigVar.register(
    () -> valueSupplier_4.incrementAndGet(),
    forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

fullConfigVar.deregister(labelValues(
    SERVICE.value("service_1"),
    SERVER.value("server_1_1"),
    PORT.value("111")));

AtomicLong valueSupplier_5 = new AtomicLong();

// Supported caching var types:
//   CachingObjectVar,
//   CachingLongVar,
//   CachingDoubleVar,
//   CachingStringVar
CachingDoubleVar defaultConfigCachingVar = registry.cachingDoubleVar(
    withName("cachingVar", "defaultConfig"),
    () -> valueSupplier_5.incrementAndGet() + 0.5);

AtomicLong valueSupplier_6 = new AtomicLong();

CachingDoubleVar fullConfigCachingVar = registry.cachingDoubleVar(
    withName("cachingVar", "fullConfig"),

    // options: Var.noTotal()
    () -> valueSupplier_6.incrementAndGet() + 0.5,

    () -> withCachingDoubleVar()
        // options: disable(), enabled(boolean)
        // default: enabled
        .enable()

        // default: no prefix label values
        .prefix(labelValues(SAMPLE.value("var")))

        .labels(SERVICE, SERVER, PORT)

        // default: 30 SECONDS
        .ttl(10, SECONDS)

        // the properties specific to the metrics implementation
        // default: no properties
        .put("key", "value"));

Metrics Reporters

Typically, metrics are used to provide runtime information about an application instance.
In order to do this, it is necessary not only to collect metrics but also to export them to
external monitoring systems that provide various means of visualizing and analyzing the collected metrics.
For example, Prometheus (https://prometheus.io).

Metrics reporter allows you to present metrics in the format of a specific monitoring system;
some reporters additionally send metrics to an external monitoring system in the appropriate format.

The following describes the reporters provided by the library out of the box.

PrometheusMetricsExporter

PrometheusMetricsExporter exports metrics in the Prometheus format (https://prometheus.io).

Dependencies:

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-prometheus</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

Note that metrics-facade-prometheus uses

<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_common</artifactId>
    <version>...</version>
</dependency>

In case of conflict between the simpleclient_common version used in Metrics Facade and the one used in your project, you can resolve it with dependency.exclusions or any other available means. For example, see metrics-facade-samples/pom.xml:

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-prometheus</artifactId>
    <version>${project.version}</version>

    <exclusions>
        <exclusion>
            <groupId>io.prometheus</groupId>
            <artifactId>simpleclient_common</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Example:

MetricRegistry registry = new DefaultMetricRegistry();
PrometheusMetricsExporter exporter = new PrometheusMetricsExporter(registry);

Histogram h = registry.histogram(
    withName("histogram"),
    () -> withHistogram()
        .labels(SERVICE, SERVER, PORT)
        .measurables(MAX, MEAN));

h.update(1, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
h.update(2, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));
h.update(3, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

// Metric instances are added asynchronously
sleep(25); 

System.out.println(exporter.exportMetrics());

Output (simplified):

# TYPE histogram_max gauge
histogram_max{service="service_1",server="server_1_1",port="111",} 1.0
histogram_max{service="service_1",server="server_1_2",port="121",} 2.0
histogram_max{service="service_2",server="server_2_1",port="211",} 3.0

# TYPE histogram_mean gauge
histogram_mean{service="service_1",server="server_1_1",port="111",} 1.0
histogram_mean{service="service_1",server="server_1_2",port="121",} 2.0
histogram_mean{service="service_2",server="server_2_1",port="211",} 3.0

If you want to modify the export (change the names, disable some metrics or measured values, etc.),
you need to configure the PrometheusMetricsExporter accordingly:

PrometheusMetricsExporterSample.java

MetricRegistry registry = new DefaultMetricRegistry();

// Default config
// PrometheusMetricsExporter prometheusMetricsExporter = new PrometheusMetricsExporter(registry);

PrometheusInstanceSampleSpecProvider miSampleSpecProvider = new PrometheusInstanceSampleSpecProvider(
    true, // exportTotalInstances. defaults to true
    false, // exportLabeledTotalInstances. defaults to false
    false); // exportLevelInstances. defaults to true

PrometheusInstanceSampleSpecModsProvider miSampleSpecModsProvider = new PrometheusInstanceSampleSpecModsProvider();

miSampleSpecModsProvider.addMod(
    forMetricInstancesMatching(
        nameMask("Histogram.**"),
        instance -> "service_2".equals(instance.valueOf(SERVICE))),
    (metric, instance, currSpec) -> instanceSampleSpec().disable());

miSampleSpecModsProvider.addMod(
    forMetricWithName("Histogram"),
    (metric, instance, currSpec) -> instanceSampleSpec()
        .name(instance.name().withNewPart(instance.valueOf(SERVICE)))
        .labelValues(currSpec.labelValuesWithout(SERVICE)));

miSampleSpecModsProvider.addMod(
    forMetricsWithNamePrefix("Histogram"),
    (metric, instance, currSpec) ->
        instanceSampleSpec().name(currSpec.name().replaceLast(currSpec.name().lastPart() + "_svc")));

PrometheusInstanceSamplesProducer miSamplesProducer = new PrometheusInstanceSamplesProducer(
    null, // totalInstanceNameSuffix. defaults to null that means no suffix
    "all"); // labeledMetricTotalInstanceNameSuffix. defaults to "all"

PrometheusSampleSpecProvider sampleSpecProvider = new PrometheusSampleSpecProvider();
PrometheusSampleSpecModsProvider sampleSpecModsProvider = new PrometheusSampleSpecModsProvider();

sampleSpecModsProvider.addMod(
    forMetricInstancesMatching(
        nameMask("Histogram.**"),
        instance -> instance instanceof HistogramInstance),
    (instanceSampleSpec, instance, measurableValues, measurable, currSpec) ->
        measurable instanceof Max ? sampleSpec().disable() : sampleSpec());

PrometheusSamplesProducer samplesProducer = new PrometheusSamplesProducer();

PrometheusInstanceSamplesProvider miSamplesProvider = new PrometheusInstanceSamplesProvider(
    miSampleSpecProvider,
    miSampleSpecModsProvider,
    miSamplesProducer,
    sampleSpecProvider,
    sampleSpecModsProvider,
    samplesProducer,
    registry);

PrometheusMetricsExporter exporter = new PrometheusMetricsExporter(
    true,
    Locale.ENGLISH,
    miSamplesProvider);

Histogram h = registry.histogram(
    withName("Histogram"),
    () -> withHistogram()
        .description("Histogram for " + PrometheusMetricsExporterSample.class.getSimpleName())
        .labels(SERVICE, SERVER, PORT)
        .measurables(MIN, MAX, MEAN));

h.update(1, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
h.update(2, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
h.update(3, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));
h.update(4, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

Timer t = registry.timer(
    withName("Timer"),
    () -> withTimer()
        .description("Timer for " + PrometheusMetricsExporterSample.class.getSimpleName())
        .labels(SERVICE, SERVER, PORT)
        .measurables(MIN, MAX, MEAN));

t.update(SECONDS.toNanos(1), forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
t.update(SECONDS.toNanos(2), forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));
t.update(SECONDS.toNanos(3), forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

Output:

# HELP histogram_service_1_svc_mean Histogram for PrometheusMetricsExporterSample
# TYPE histogram_service_1_svc_mean gauge
histogram_service_1_svc_mean{server="server_1_1",port="111",} 1.5
histogram_service_1_svc_mean{server="server_1_2",port="121",} 3.0
# HELP histogram_service_1_svc_min Histogram for PrometheusMetricsExporterSample
# TYPE histogram_service_1_svc_min gauge
histogram_service_1_svc_min{server="server_1_1",port="111",} 1.0
histogram_service_1_svc_min{server="server_1_2",port="121",} 3.0
# HELP timer_max Timer for PrometheusMetricsExporterSample
# TYPE timer_max gauge
timer_max{service="service_1",server="server_1_1",port="111",} 1.002438655
timer_max{service="service_1",server="server_1_2",port="121",} 2.004877311
timer_max{service="service_2",server="server_2_1",port="211",} 3.003121663
# HELP timer_mean Timer for PrometheusMetricsExporterSample
# TYPE timer_mean gauge
timer_mean{service="service_1",server="server_1_1",port="111",} 1.000341504
timer_mean{service="service_1",server="server_1_2",port="121",} 2.000683008
timer_mean{service="service_2",server="server_2_1",port="211",} 2.9947330560000003
# HELP timer_min Timer for PrometheusMetricsExporterSample
# TYPE timer_min gauge
timer_min{service="service_1",server="server_1_1",port="111",} 0.998244352
timer_min{service="service_1",server="server_1_2",port="121",} 1.996488704
timer_min{service="service_2",server="server_2_1",port="211",} 2.986344448

ZabbixMetricsJsonExporter and ZabbixLldMetricsReporter

ZabbixMetricsJsonExporter exports metrics in the format that can be adapted for Zabbix.
This is a non-standard format used in a number of RingCentral projects.

ZabbixLldMetricsReporter exports MBeans that can be used for Zabbix low-level discovery
(https://www.zabbix.com/documentation/current/manual/discovery/low_level_discovery).

Dependencies:

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-base</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

ZabbixMetricsJsonExporter and ZabbixLldMetricsReporter are best explained with:
ZabbixReportersSample.java

MetricRegistry registry = new DefaultMetricRegistry();
DefaultInstanceSampleSpecModsProvider miSampleSpecModsProvider = new DefaultInstanceSampleSpecModsProvider();

miSampleSpecModsProvider.addMod(
    forMetricInstancesMatching(
        nameMask("histogram.**"),
        instance -> "service_2".equals(instance.valueOf(SERVICE))),
    (metric, instance, currSpec) -> instanceSampleSpec().disable());

miSampleSpecModsProvider.addMod(
    forMetricWithName("histogram"),
    (metric, instance, currSpec) -> instanceSampleSpec().name(instance.name().withNewPart("test")));

DefaultSampleSpecModsProvider sampleSpecModsProvider = new DefaultSampleSpecModsProvider();

sampleSpecModsProvider.addMod(
    forMetricInstancesMatching(
        nameMask("histogram.**"),
        instance -> instance instanceof HistogramInstance),
    (instanceSampleSpec, instance, measurableValues, measurable, currSpec) ->
        measurable instanceof Max ? sampleSpec().disable() : sampleSpec());

DefaultInstanceSamplesProvider miSamplesProvider = new DefaultInstanceSamplesProvider(
    miSampleSpecModsProvider,
    sampleSpecModsProvider,
    new DefaultSampleSpecProvider(CustomMeasurableNameProvider.INSTANCE),
    registry);

ZabbixMetricsJsonExporter exporter = new ZabbixMetricsJsonExporter(miSamplesProvider);

// LLD
ZGroupMBeansExporter zGroupMBeansExporter = new ZGroupMBeansExporter(
    "zabbixReportersSample.zabbix.lld:type=",
    DefaultZGroupJsonMapper.INSTANCE,
    "JsonData");

zGroupMBeansExporter.ensureGroup("server");
ZabbixLldMetricsReporter lldReporter = new ZabbixLldMetricsReporter(zGroupMBeansExporter);

lldReporter.addRules(
    forMetricInstancesMatching(nameMask("histogram.**")),
    new Rule(
        "service",
        List.of(new RuleItem(SERVICE, "service"), new RuleItem(SERVER, "server"))));

lldReporter.addRules(
    forMetricInstancesMatching(nameMask("histogram.**")),
    new Rule(
        "server",
        List.of(new RuleItem(i -> i.valueOf(SERVICE) + "/" + i.valueOf(SERVER), "server"))));

registry.addListener(lldReporter);

// Metrics
Histogram h = registry.histogram(
    withName("histogram"),
    () -> withHistogram()
        .labels(SERVICE, SERVER, PORT)
        .measurables(COUNT, MAX, MEAN, Buckets.of(scale())));

h.update(1, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
h.update(2, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));
h.update(3, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

...

static ScaleBuilder<?> scale() {
    return
        // 100 ms
        first(linear().steps(25, MILLISECONDS, 4))
        // 500 ms
        .then(linear().steps(100, MILLISECONDS, 4).withInf());
}

exporter.exportMetrics() as JSON:

{
  "delta": [
    {
      "timer.test.count": 3
    },
    {
      "timer.test.service_1.count": 2
    },
    {
      "timer.test.service_1.server_1_1.111.count": 1
    },
    {
      "timer.test.service_1.server_1_1.count": 1
    },
    {
      "timer.test.service_1.server_1_2.121.count": 1
    },
    {
      "timer.test.service_1.server_1_2.count": 1
    }
  ],
  "instant": [
    {
      "timer.test.duration.0ms_bucket": 0
    },
    {
      "timer.test.duration.100ms_bucket": 3
    },
    {
      "timer.test.duration.200ms_bucket": 3
    },
    {
      "timer.test.duration.25ms_bucket": 3
    },
    {
      "timer.test.duration.300ms_bucket": 3
    },
    {
      "timer.test.duration.400ms_bucket": 3
    },
    {
      "timer.test.duration.500ms_bucket": 3
    },
    {
      "timer.test.duration.50ms_bucket": 3
    },
    {
      "timer.test.duration.75ms_bucket": 3
    },
    {
      "timer.test.duration.inf_bucket": 3
    },
    {
      "timer.test.duration.mean": 2.0E-6
    },
    {
      "timer.test.service_1.duration.0ms_bucket": 0
    },
    {
      "timer.test.service_1.duration.100ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.200ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.25ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.300ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.400ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.500ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.50ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.75ms_bucket": 2
    },
    {
      "timer.test.service_1.duration.inf_bucket": 2
    },
    {
      "timer.test.service_1.duration.mean": 1.5E-6
    },
    {
      "timer.test.service_1.server_1_1.111.duration.0ms_bucket": 0
    },
    {
      "timer.test.service_1.server_1_1.111.duration.100ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.200ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.25ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.300ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.400ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.500ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.50ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.75ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.inf_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.111.duration.mean": 1.0E-6
    },
    {
      "timer.test.service_1.server_1_1.duration.0ms_bucket": 0
    },
    {
      "timer.test.service_1.server_1_1.duration.100ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.200ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.25ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.300ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.400ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.500ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.50ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.75ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.inf_bucket": 1
    },
    {
      "timer.test.service_1.server_1_1.duration.mean": 1.0E-6
    },
    {
      "timer.test.service_1.server_1_2.121.duration.0ms_bucket": 0
    },
    {
      "timer.test.service_1.server_1_2.121.duration.100ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.200ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.25ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.300ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.400ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.500ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.50ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.75ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.inf_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.121.duration.mean": 2.0E-6
    },
    {
      "timer.test.service_1.server_1_2.duration.0ms_bucket": 0
    },
    {
      "timer.test.service_1.server_1_2.duration.100ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.200ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.25ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.300ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.400ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.500ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.50ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.75ms_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.inf_bucket": 1
    },
    {
      "timer.test.service_1.server_1_2.duration.mean": 2.0E-6
    }
  ]
}

JMX MBean attribute zabbixReportersSample.zabbix.lld:type=service.JsonData:

{
  "data": [
    {
      "{#SERVICE}": "service_1",
      "{#SERVER}": "server_1_1"
    },
    {
      "{#SERVICE}": "service_1",
      "{#SERVER}": "server_1_2"
    },
    {
      "{#SERVICE}": "service_2",
      "{#SERVER}": "server_2_1"
    }
  ]
}

JMX MBean attribute zabbixReportersSample.zabbix.lld:type=server.JsonData:

{
  "data": [
    {
      "{#SERVER}": "service_1/server_1_1"
    },
    {
      "{#SERVER}": "service_1/server_1_2"
    },
    {
      "{#SERVER}": "service_2/server_2_1"
    }
  ]
}

TelegrafMetricsJsonExporter

TelegrafMetricsJsonExporter exports metrics in the Telegraf format (https://github.com/influxdata/telegraf).

Dependencies:

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-base</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

TelegrafMetricsJsonExporter is best explained with:
TelegrafMetricsJsonExporterSample.java

MetricRegistry registry = new DefaultMetricRegistry();
DefaultInstanceSampleSpecModsProvider miSampleSpecModsProvider = new DefaultInstanceSampleSpecModsProvider();

miSampleSpecModsProvider.addMod(
    forMetricInstancesMatching(
        nameMask("histogram.**"),
        instance -> "service_2".equals(instance.valueOf(SERVICE))),
    (metric, instance, currSpec) -> instanceSampleSpec().disable());

miSampleSpecModsProvider.addMod(
    forMetricWithName("histogram"),
    (metric, instance, currSpec) -> instanceSampleSpec().name(instance.name().withNewPart("test")));

DefaultSampleSpecModsProvider sampleSpecModsProvider = new DefaultSampleSpecModsProvider();

sampleSpecModsProvider.addMod(
    forMetricInstancesMatching(
        nameMask("histogram.**"),
        instance -> instance instanceof HistogramInstance),
    (instanceSampleSpec, instance, measurableValues, measurable, currSpec) ->
        measurable instanceof Max ? sampleSpec().disable() : sampleSpec());

DefaultInstanceSamplesProvider miSamplesProvider = new DefaultInstanceSamplesProvider(
    miSampleSpecModsProvider,
    sampleSpecModsProvider,
    new DefaultSampleSpecProvider(CustomMeasurableNameProvider.INSTANCE),
    registry);

// Metrics
Histogram h = registry.histogram(
    withName("histogram"),
    () -> withHistogram()
        .labels(SERVICE, SERVER, PORT)
        .measurables(COUNT, MAX, MEAN));

h.update(1, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
h.update(2, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));
h.update(3, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

exporter.exportMetrics() as JSON (without grouping by types):

{
  "histogram.test.mean": 2.0,
  "histogram.test.count": 3,
  "histogram.test.service_1.server_1_1.111.mean": 1.0,
  "histogram.test.service_1.server_1_1.111.count": 1,
  "histogram.test.service_1.server_1_2.121.mean": 2.0,
  "histogram.test.service_1.server_1_2.121.count": 1,
  "histogram.test.service_1.mean": 1.5,
  "histogram.test.service_1.count": 2,
  "histogram.test.service_1.server_1_2.mean": 2.0,
  "histogram.test.service_1.server_1_2.count": 1,
  "histogram.test.service_1.server_1_1.mean": 1.0,
  "histogram.test.service_1.server_1_1.count": 1
}

exporter.exportMetrics() as JSON (with grouping by types - a non-standard format):

{
  "instant": {
    "histogram.test.mean": 2.0,
    "histogram.test.service_1.server_1_1.111.mean": 1.0,
    "histogram.test.service_1.server_1_2.121.mean": 2.0,
    "histogram.test.service_1.mean": 1.5,
    "histogram.test.service_1.server_1_2.mean": 2.0,
    "histogram.test.service_1.server_1_1.mean": 1.0
  },
  "delta": {
    "histogram.test.count": 3,
    "histogram.test.service_1.server_1_1.111.count": 1,
    "histogram.test.service_1.server_1_2.121.count": 1,
    "histogram.test.service_1.count": 2,
    "histogram.test.service_1.server_1_2.count": 1,
    "histogram.test.service_1.server_1_1.count": 1
  }
}

JmxMetricsReporter

JmxMetricsReporter exports MBeans for the corresponding metric instances.

Dependencies:

<dependency>
    <groupId>com.ringcentral.platform.metrics</groupId>
    <artifactId>metrics-facade-base</artifactId>
    <version>3.2.0-RELEASE</version>
</dependency>

JmxMetricsReporter is best explained with:
JmxMetricsReporterSample.java

MetricRegistry registry = new DefaultMetricRegistry();

// Default config
// registry.addListener(new JmxMetricsReporter());

MaskTreeMetricNamedInfoProvider<MBeanSpecProvider> mBeanSpecs = new MaskTreeMetricNamedInfoProvider<>();

mBeanSpecs.addInfo(
    forMetricInstancesMatching(
        nameMask("rate.**"),
        instance -> "service_2".equals(instance.valueOf(SERVICE))),
    instance -> mBeanSpec().disable());

mBeanSpecs.addInfo(
    forMetricsWithNamePrefix("rate"),
    instance -> mBeanSpec()
        .name(
            instance.isTotalInstance() ?
            instance.name() :
            instance.name().withNewPart(instance.valueOf(SERVICE)))
        .labelValues(instance.labelValuesWithout(SERVICE)));

JmxMetricsReporter jmxReporter = new JmxMetricsReporter(
    mBeanSpecs,
    getPlatformMBeanServer(),
    new DefaultObjectNameProvider(),
    new CustomMeasurableNameProvider(),
    "JmxMetricsReporterSample");

registry.addListener(jmxReporter);

Rate r = registry.rate(
    withName("rate"),
    () -> withRate()
        .labels(SERVICE, SERVER, PORT)
        .measurables(COUNT, MEAN_RATE, ONE_MINUTE_RATE));

r.mark(1, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_1"), PORT.value("111")));
r.mark(2, forLabelValues(SERVICE.value("service_1"), SERVER.value("server_1_2"), PORT.value("121")));
r.mark(3, forLabelValues(SERVICE.value("service_2"), SERVER.value("server_2_1"), PORT.value("211")));

JMX MBean attribute JmxMetricsReporterSample:name=rate.service_1,server=server_1_2,port=121.count = 2

Metrics Producers

Metric producer adds some predefined metrics to a registry.
Metrics producers are represented by subclasses of MetricsProducer.

MetricsProducersSample.java

MetricRegistry registry = new DefaultMetricRegistry();

// adds some system metrics
new SystemMetricsProducer().produceMetrics(registry);

SystemMetricsProducer

SystemMetricsProducer just combines:

RuntimeMetricsProducer
OperatingSystemMetricsProducer
GarbageCollectorsMetricsProducer
MemoryMetricsProducer
ThreadsMetricsProducer
BufferPoolsMetricsProducer
ClassesMetricsProducer

RuntimeMetricsProducer

Adds a number of metrics related to the JVM.
See RuntimeMetricsProducer for details.
It is based on java.lang.management.RuntimeMXBean.

OperatingSystemMetricsProducer

Adds a number of metrics related to operating system: OS name, OS architecture, CPU time used by the process, etc.
See DefaultOperatingSystemMetricsProducer and LabeledOperatingSystemMetricsProducer for details.
It is based on com.sun.management.OperatingSystemMXBean.

GarbageCollectorsMetricsProducer

Adds a number of metrics related to garbage collection:
the total number of collections that have occurred, the approximate accumulated collection elapsed time, etc.
See DefaultGarbageCollectorsMetricsProducer and LabeledGarbageCollectorsMetricsProducer for details.
It is based on java.lang.management.GarbageCollectorMetricSet.

MemoryMetricsProducer

Adds a number of metrics related to memory:
heap memory usage, non-heap memory usage, etc.
See DefaultMemoryMetricsProducer and LabeledMemoryMetricsProducer for details.
It is based on java.lang.management.MemoryMXBean.

ThreadsMetricsProducer

Adds a number of metrics related to threads:
the current number of live threads, the total number of threads created and also started since the JVM started, etc.
See DefaultThreadsMetricsProducer and LabeledThreadsMetricsProducer for details.
It is based on java.lang.management.ThreadMXBean

BufferPoolsMetricsProducer

Adds a number of metrics related to buffer pools.
See DefaultBufferPoolsMetricsProducer and LabeledBufferPoolsMetricsProducer for details.
It is based on the MBeans java.nio:type=BufferPool,name=<pool_name>.

ClassesMetricsProducer

Adds a number of metrics related to the class loading system of the JVM. See ClassesMetricsProducer for details.
It is based on java.lang.management.ClassLoadingMXBean.

Integration

Dropwizard MetricRegistry integration

Export using DropwizardMetricRegistryDefaultInstanceSamplesProvider

DropwizardMetricRegistryDefaultInstanceSamplesProvider helps to export metrics from existing com.codahale.metrics.MetricRegistry using Metrics Reporters (except PrometheusMetricsExporter).

For example,

MetricRegistry registry = new MetricRegistry();

AtomicInteger g1 = new AtomicInteger();
registry.gauge("g1", () -> new Gauge<Integer>() {
  @Override
  public Integer getValue() {
    return g1.incrementAndGet();
  }
});

Counter c1 = registry.counter("c1");
c1.inc();

Histogram h1 = registry.histogram("h1");
h1.update(10);
h1.update(100);
h1.update(1000);

Timer t1 = registry.timer("t1");
t1.update(Duration.ofMinutes(10));

Meter m1 = registry.meter("m1");
m1.mark();
m1.mark(10);

DropwizardMetricRegistryDefaultInstanceSamplesProvider provider = new DropwizardMetricRegistryDefaultInstanceSamplesProvider(registry);

TelegrafMetricsJsonExporter telegrafMetricsJsonExporter = new TelegrafMetricsJsonExporter(true, provider);
MetricsJson metrics = telegrafMetricsJsonExporter.exportMetrics();

ObjectMapper mapper = new ObjectMapper();
System.out.println(mapper.writeValueAsString(metrics));

Output:

{"instant":{"g1":1.0,"h1.mean":370.0,"h1.max":1000,"h1.min":10,"h1.median":100.0,"h1.std_dev":446.9899327725402,"h1.75_percentile":1000.0,"h1.95_percentile":1000.0,"h1.98_percentile":1000.0,"h1.99_percentile":1000.0,"h1.999_percentile":1000.0,"t1.mean":600.0,"t1.max":600.0,"t1.min":600.0,"t1.median":600.0,"t1.std_dev":0.0,"t1.75_percentile":600.0,"t1.95_percentile":600.0,"t1.98_percentile":600.0,"t1.99_percentile":600.0,"t1.999_percentile":600.0,"m1.1_minute_rate":0.0,"m1.5_minute_rate":0.0,"m1.15_minute_rate":0.0,"m1.mean_rate":288.9463950588065},"delta":{"c1":1,"h1.count":3,"t1.count":1,"m1.total":11}}

Required dependency:

<dependency>
  <groupId>com.ringcentral.platform.metrics</groupId>
  <artifactId>metrics-facade-dropwizard</artifactId>
  <version>3.2.0-RELEASE</version>
</dependency>

Export using DropwizardMetricRegistryPrometheusInstanceSamplesProvider

DropwizardMetricRegistryPrometheusInstanceSamplesProvider helps to export metrics from existing com.codahale.metrics.MetricRegistry using PrometheusMetricsExporter.

For example,

MetricRegistry registry = new MetricRegistry();

AtomicInteger g1 = new AtomicInteger();
registry.gauge("g1", () -> new Gauge<Integer>() {
  @Override
  public Integer getValue() {
    return g1.incrementAndGet();
  }
});

Counter c1 = registry.counter("c1");
c1.inc();

Histogram h1 = registry.histogram("h1");
h1.update(10);
h1.update(100);
h1.update(1000);

Timer t1 = registry.timer("t1");
t1.update(Duration.ofMinutes(10));

Meter m1 = registry.meter("m1");
m1.mark();
m1.mark(10);

DropwizardMetricRegistryDefaultInstanceSamplesProvider provider = new DropwizardMetricRegistryDefaultInstanceSamplesProvider(registry);

TelegrafMetricsJsonExporter telegrafMetricsJsonExporter = new TelegrafMetricsJsonExporter(true, provider);
MetricsJson metrics = telegrafMetricsJsonExporter.exportMetrics();

System.out.println(metrics);

Output:

# HELP h1 Generated from Dropwizard metric import (metric=h1, type=com.codahale.metrics.Histogram)
# TYPE h1 summary
h1{quantile="0.5",} 100.0
h1{quantile="0.75",} 1000.0
h1{quantile="0.95",} 1000.0
h1{quantile="0.98",} 1000.0
h1{quantile="0.99",} 1000.0
h1{quantile="0.999",} 1000.0
h1_count 3.0
# HELP g1 Generated from Dropwizard metric import (metric=g1, type=com.ringcentral.platform.metrics.samples.temp.DropwizardMetricRegistryPrometheusInstanceSamplesProviderSample$1)
# TYPE g1 gauge
g1 0.0
# HELP t1 Generated from Dropwizard metric import (metric=t1, type=com.codahale.metrics.Timer)
# TYPE t1 summary
t1{quantile="0.5",} 600.0
t1{quantile="0.75",} 600.0
t1{quantile="0.95",} 600.0
t1{quantile="0.98",} 600.0
t1{quantile="0.99",} 600.0
t1{quantile="0.999",} 600.0
t1_count 1.0
# HELP c1 Generated from Dropwizard metric import (metric=c1, type=com.codahale.metrics.Counter)
# TYPE c1 gauge
c1 1.0

Required dependency:

<dependency>
  <groupId>com.ringcentral.platform.metrics</groupId>
  <artifactId>metrics-facade-dropwizard-to-prometheus</artifactId>
  <version>3.2.0-RELEASE</version>
</dependency>

License

MIT

metrics-facade's People

Contributors

Stargazers

Watchers

Forkers

devromik isabella232 kkolyan viyond

metrics-facade's Issues

Add a GitHub action for deploying the library to Maven Central

Implement a new default MetricRegistry implementation

Support removing export modifications

Some of the users of the library are currently migrating from Dropwizard. In Dropwizard, they test monitoring triggers by first deregistering a real metric and then registering a mock one with the same name and a value that activates the corresponding trigger. Since we believe that modifying real metrics is not a good solution (e.g., possible concurrent metric updates could be lost), we've decided to solve the same problem by supporting temporal export modifications so as not to affect the real metrics. Example:

@RequestMapping(value = "/counter/prometheus/value/{value}", method = POST)
public synchronized void modifyPrometheusValue(@PathVariable String value) {
    sampleSpecModsProvider.removeMod("testTriggerFor.counter");

    if (!"real".equalsIgnoreCase(value)) {
        sampleSpecModsProvider.addMod(
            "testTriggerFor.counter",
             forMetricWithName("counter"),
             (instanceSampleSpec, instance, measurableValues, measurable, currSpec) ->
                 measurable instanceof Count ? sampleSpec().value(parseDouble(value)) : null);
    }
}

Support exporting metrics from Dropwizard MetricRegistry

It can be implemented via subtypes of DefaultInstanceSamplesProvider/PrometheusInstanceSamplesProvider

Optimize sanitizing metric names and label names in DefaultPrometheusMetricSanitizer

Replace regexes with loops.

Add docs for Histogram implementations in DefaultMetricRegistry

HDR impl
Scale impl
The general optimization scheme implemented in AbstractHistogramImpl

Add META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports for Spring integration

https://stackoverflow.com/a/75124719

Add possibility to define MetricInstances for specific label values prior to any updates

Currently, the feature can be emulated for Counter as follows:

сounter.inc(0, forDimensionValues(MY_DIMENSION.value("A")));
сounter.inc(0, forDimensionValues(MY_DIMENSION.value("B")));

As a result, these zeros will be exported prior to any real updates.

Generate metrics used by SystemMetricsProducer using dimensions

Now if one registers SystemMetricsProducer and exports it (for example using Prometheus exporter), there will be metrics in the following format:

...
# HELP Memory_pools_G1_Survivor_Space_usage Generated from metric instances with name Memory.pools.G1-Survivor-Space.usage
# TYPE Memory_pools_G1_Survivor_Space_usage gauge
Memory_pools_G1_Survivor_Space_usage 0.04085540771484375
# HELP Memory_pools_G1_Old_Gen_usage Generated from metric instances with name Memory.pools.G1-Old-Gen.usage
# TYPE Memory_pools_G1_Old_Gen_usage gauge
Memory_pools_G1_Old_Gen_usage 0.017514586448669434
# HELP Memory_pools_Compressed_Class_Space_usage Generated from metric instances with name Memory.pools.Compressed-Class-Space.usage
# TYPE Memory_pools_Compressed_Class_Space_usage gauge
Memory_pools_Compressed_Class_Space_usage 0.006308794021606445
# HELP Memory_pools_CodeCache_usage Generated from metric instances with name Memory.pools.CodeCache.usage
# TYPE Memory_pools_CodeCache_usage gauge
Memory_pools_CodeCache_usage 0.2337773640950521
# HELP Memory_non_heap_usage Generated from metric instances with name Memory.non-heap.usage
# TYPE Memory_non_heap_usage gauge
Memory_non_heap_usage -6.582004E7
# HELP Memory_pools_G1_Eden_Space_usage Generated from metric instances with name Memory.pools.G1-Eden-Space.usage
# TYPE Memory_pools_G1_Eden_Space_usage gauge
Memory_pools_G1_Eden_Space_usage 0.35365853658536583
# HELP Memory_pools_Metaspace_usage Generated from metric instances with name Memory.pools.Metaspace.usage
# TYPE Memory_pools_Metaspace_usage gauge
Memory_pools_Metaspace_usage 0.9908933534726991
...

Example above contains part of MemoryMetricsProducer's output .

It could be reworked in the following way:

# HELP jvm_Memory_pools_usage Generated from metric instances with name jvm.Memory.pools.usage
# TYPE jvm_Memory_pools_usage gauge
jvm_Memory_pools_usage{type="G1-Old-Gen",
} 0.0026739835739135742
jvm_Memory_pools_usage{type="G1-Survivor-Space",
} 1.0
jvm_Memory_pools_usage{type="Metaspace",
} 0.960856277461597
jvm_Memory_pools_usage{type="CodeHeap-'non-profiled-nmethods'",
} 0.04282428770387274
jvm_Memory_pools_usage{type="Compressed-Class-Space",
} 0.007082119584083557
jvm_Memory_pools_usage{type="CodeHeap-'non-nmethods'",
} 0.1676891268980477
jvm_Memory_pools_usage{type="G1-Eden-Space",
} 0.7045454545454546

The second format seems to be more convenient. Also it allows to avoid of redundant HELP and TYPE and to provide proper description for the whole group of metrics.

So I suggest add a parameter to SystemMetricsProducer (and Producers used by it), which will specify format of the metrics.

Here by "specify format" I mean that during construction of corresponding producer, it will create corresponding metrics using dimensions and won't create a dedicated metric per each (in case of example above it's "type" dimension)

Improve eviction and expiration of labeled metrics

AbstractMeter: order MetricInstances within the same millisecond based on their insertion order.
Provide CompletableFuture<Iterator> iterator(Executor completionExecutor) to return iterator after removeExpiredInstances(). Update exporters accordingly.
Make periodic actualisation (removeExpiredInstancesAndSchedule) of MetricInstances optional.
Consider eliminating EXPIRED_INSTANCES_REMOVAL_ADDITIONAL_DELAY_MS in
executor.schedule(
this::removeExpiredInstancesAndSchedule,
baseDelayMs + EXPIRED_INSTANCES_REMOVAL_ADDITIONAL_DELAY_MS, MILLISECONDS);

Consider min(baseDelayMs, 10000)

Use custom Spring bean name prefix

An application may be based on a Spring-based framework with pre-defined metrics registry named without qualifiers which will result in bean definition confict if used together with metrics facade starter.

We need to set a custom bean name prefix to avoid such conflicts.

Add support for Min measurable to PrometheusMetricsExporter

Get rid of forLabelValues wrapper

The 'forLabelValue' wrapper currently doesn't provide beneficial functionality and produces undesirable garbage when a labeled metric is updated. This task involves eliminating this wrapper to avoid these issues.

Split the test methods into smaller ones

It will allow to add more descriptive names/descriptions, which could contain preconditions and expected outcome.
The following tests could be refactored:

PrometheusMetricsExporterTest
PrometheusInstanceSampleTest
PrometheusSampleMakerTest
SimpleCollectorRegistryPrometheusInstanceSamplesProviderTest

Support OpenMetrics

Extend support for OpenMetrics including appropriate export.
Currently there is no way to define some of its domain entries such as types (e.g. info and stateset), there is no support for units.

Support making both Prometheus.Summary and Prometheus.Histogram for com.ringcentral.platform.metrics.histogram.Histogram

That capability should be optional.
Also, we should include Count and TotalSum in both the MetricFamilySamples.

Fix "le" label for Histogram

.measurables(
    ..                        
    Bucket.of(27.5),
    ..)

Expected

histogram_fullConfig_byService_bucket{sample="histogram",service="service_1",le="27.5",} 2.0

Actual

histogram_fullConfig_byService_bucket{sample="histogram",service="service_1",le="27p5",} 2.0

[Prometheus] Sort buckets and quantiles in increasing numerical order of their label values

According to https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries

The buckets of a histogram and the quantiles of a summary must appear 
in increasing numerical order of their label values (for the le or the quantile label, respectively).

Now it doesn't work this way.

Support exporting metrics from Prometheus CollectorRegistry

Add documentation (specification) for metrics produced by SystemMetricsProducer

Redesign export

Make modification API simpler and fluent. TBD
DSL?
Avoid recreating specs (InstanceSampleSpec, SampleSpec) for MetricInstances that have not changed since the last export
Eliminate boxing/unboxing of measurable values to improve performance.

Support sanitizing label names in PrometheusMetricsExporter

Always deregister/register *Var metrics

com/ringcentral/platform/metrics/var/AbstractVar.java:178 contains check, which prevents registration of metric if it exists.
Registration (as well as removal) happens concurrently whereas check doesn't.
This is why the following:

            LongVar metric = registry.longVar(getName(),Var.noTotal(),
                    () -> withLongVar().dimensions(DIMENSION_1, DIMENSION_2)
            );
            // 1
            metric.register(successProvider, successDimensionValues);
            metric.register(failedProvider, failedDimensionValues);
            // 2
            metric.deregister(successDimensionValues);
            metric.deregister(failedDimensionValues);
            // 3
            metric.register(successProvider, successDimensionValues);
            metric.register(failedProvider, failedDimensionValues);

could result with absence of metrics at all:

(1) registration is scheduled and metrics are registered right away
(2) deregistration is scheduled
(3) registration finds out that metrics exist, so registration is skipped
actual deregistration is hapenning

Suggestion: remove the check.

Add stop with MetricDimensionValues parameter to Stopwatch interface

Let's consider the following case:

    public static void main(String[] args) throws InterruptedException {
        DropwizardMetricRegistry registry = new DropwizardMetricRegistry();

        MetricDimension statusDimension = new MetricDimension("status");

        Timer timer = registry.timer(
                withName("timer"),
                () -> withTimer().dimensions(statusDimension)
        );
        
        Stopwatch stopwatch = timer.stopwatch();
        try {
            operation();
            stopwatch.stop(dimensionValues(statusDimension.value("success")));
        } catch (ExceptionInInitializerError ex) {
            stopwatch.stop(dimensionValues(statusDimension.value("failure")));
        }
    }

At the moment of stopwatch's creation value of statusDimension isn't known.

So in this case it's not possible to use Stopwatch and a need to use some custom solution appears.

However stop with MetricDimensionValues dimensionValues parameter seems to fit naturally.

I suggest to add it.

ps:
In my opinion , dimensionValues has to be passed to Stopwatch instance only once.

The second attempt should lead to exception.
For example,

        Stopwatch stopwatch = timer.stopwatch(dimensionValues(statusDimension.value("success")));
        operation();
        // an exception should be thrown
        stopwatch.stop(dimensionValues(statusDimension.value("fatal")));

Set default HdrHistogramImplConfig.highestTrackableValue to 1 hour for Timers

Add a styleguide

We should create an automatic formatter for IDEA.
The list of rules:

There should be an empty line after a class header:

public abstract class A {
    
      public final MetricName name;
      ...
}

There should be no empty lines if the definition of the class consists only of constants.

public abstract class A {
      public static final MetricName DEFAULT_NAME_PREFIX = MetricName.of("Buffers");
}

We use only 4-space indentation for all the cases.
We use the following formatting schema for long enough parameter lists (2+ usually but it depends):

protected Supplier<LongVarConfigBuilder> longVarConfigBuilderSupplier(
        String description, 
        MetricDimension... dimensions) {

        ...
}

protected Supplier<LongVarConfigBuilder> longVarConfigBuilderSupplier(
        String description, MetricDimension... dimensions) {

        ...
}

We add an empty line after a multi-line definition. For example,

registry.longVar(
    nameWithSuffix("startTime"),
    runtimeMxBean::getStartTime,
    longVarConfigBuilderSupplier("The start time of the Java virtual machine in milliseconds"));

registry.longVar(
     nameWithSuffix("uptime", "ms"),
     runtimeMxBean::getUptime,
     longVarConfigBuilderSupplier("The uptime of the Java virtual machine in milliseconds"));

We don't use the prefix 'is' for fields or parameters, only for methods: isDimensional.
We don't add an empty line after a method signature:

public void produceMetrics(MetricRegistry registry) {
   // no empty line here
    for (int i = 0; i < ATTRS.length; ++i) {

We don't use a space for type casting: (double)i
There should be an empty line before and after if/try inside a method body (but not for the last statement).
There should be no empty before closing brace in a class definition.

Add methods for setting metric implementation builders that take an appropriate interface as a parameter instead of a java.lang.Object

For example, the method .with(...) used in com.ringcentral.platform.metrics.samples.histogram.HistogramSample takes a java.lang.Object that is an error-prone:

.with(hdrImpl()
  .resetByChunks(6, Duration.ofMinutes(2))
  .highestTrackableValue(1000, REDUCE_TO_HIGHEST_TRACKABLE)
  .significantDigits(3)
  .snapshotTtl(30, SECONDS))

We should add a method that takes some appropriate interface instead.

Consider renaming the library to better reflect its capabilities

Add docs for Spring integration

There is an example app, which shows how to integrate metrics-facade with Spring, but there is nothing about Spring support in docs.

I think it's worth to add a section, which will describe corresponding functionality (existing configuration options, how to customize, etc).

Structure the docs

split README.md up into multiple smaller docs
add info on where to find the samples output (localhost:9095) and samples in general
add Contribution section
export model
benchmarks
migrate to wiki
TBD

Predicate passed in forMetricInstancesMatching isn't applied

    public static void main(String[] args) {
        DropwizardMetricRegistry registry = new DropwizardMetricRegistry();
        PrometheusInstanceSampleSpecModsProvider instanceSampleSpecModsProvider = new PrometheusInstanceSampleSpecModsProvider();
        PrometheusInstanceSamplesProvider instanceSamplesProvider = new PrometheusInstanceSamplesProvider(
                instanceSampleSpecModsProvider,
                registry
        );


        MetricName name = MetricName.of("counter_1", "suffix");
        System.out.println(name + " part 0 is 'counter_1': " + name.part(0).equals("counter_1"));
        Counter counter = registry.counter(name);
        counter.inc();

        instanceSampleSpecModsProvider.addMod(
                forMetricInstancesMatching(
                        nameMask("counter_1.suffix"),
                        mi -> mi.name().part(0).equals("counter_1")
                ),
                (metric, instance) -> prometheusInstanceSampleSpec()
                        .name(MetricName.name("new_name_for_counter_1"))
        );


        MetricName name2 = MetricName.of("counter_2", "type_1", "suffix");
        System.out.println(name2 + " part 0 is 'counter_2': " + name2.part(0).equals("counter_2"));
        Counter counter2 = registry.counter(name2);
        counter2.inc();

        instanceSampleSpecModsProvider.addMod(
                forMetricInstancesMatching(
                        nameMask("counter_2.**.suffix"),
                        mi -> mi.name().part(0).equals("counter_2")
                ),
                (metric, instance) -> prometheusInstanceSampleSpec()
                        .name(MetricName.name("new_name_for_counter_2"))
        );

        PrometheusMetricsExporter prometheusMetricsExporter = new PrometheusMetricsExporter(instanceSamplesProvider);
        System.out.println(prometheusMetricsExporter.exportMetrics());
    }

Actual output:

counter_1.suffix part 0 is 'counter_1': true
counter_2.type_1.suffix part 0 is 'counter_2': true
# HELP counter_2_type_1_suffix Generated from metric instances with name counter_2.type_1.suffix
# TYPE counter_2_type_1_suffix gauge
counter_2_type_1_suffix 1.0
# HELP new_name_for_counter_1 Generated from metric instances with name counter_1.suffix
# TYPE new_name_for_counter_1 gauge
new_name_for_counter_1 1.0

Expected output:

counter_1.suffix part 0 is 'counter_1': true
counter_2.type_1.suffix part 0 is 'counter_2': true
# HELP new_name_for_counter_1 Generated from metric instances with name counter_1.suffix
# TYPE new_name_for_counter_1 gauge
new_name_for_counter_1 1.0
# HELP new_name_for_counter_2 Generated from metric instances with name counter_2.type_1.suffix
# TYPE new_name_for_counter_2 gauge
new_name_for_counter_2 1.0

It's expected that names for both counters will be replaced.
But currently only counter_1's name is replaced.

It seems that the following line contains a problem:
https://github.com/ringcentral/metrics-facade/blob/master/metrics-facade-base/src/main/java/com/ringcentral/platform/metrics/infoProviders/MaskTreeMetricNamedInfoProvider.java#L149

named should be used instead of name.
At least it solves the issue in my case.

Please, take a look.

Support sanitizing label values for Prometheus

prometheus/client_golang#274

JMX exporter to properly handle dimension duplicates

If a metrics has a "name" dimension then it won't be exported as it will be conflicted with the name part of the JMX object name.

Support annotation-based metric implementations discovery

Currently, we use the following approach to register custom metric implementations:

DefaultMetricRegistry registry = new DefaultMetricRegistry();
registry.extendWith(LastValueHistogramImplConfig.class, new LastValueHistogramImplMaker());

See com.ringcentral.platform.metrics.samples.histogram.HistogramSample for more details.
We need to add support for annotation-based automatic metric implementations discovery.

Upgrade dependencies

We should also try to get rid of the following deps:

commons-lang3
jackson-*

Support accumulative modifications

By modifications we mean MetricRegistry.preConfigure, MetricRegistry.postConfigure, PrometheusInstanceSampleSpecModsProvider, etc.

Currently, modifications are based on the original entity (Metric, MetricInstance) and overwrite the previous ones. We need to support accumulative modifications that take into account both the original entity and all the previous modifications.

Add SystemMetricsProducer to spring autoconfiguration

SystemMetricsProducer isn't part of Spring autoconfiguration.

I suggest to add a possibility to turn on its autoconfiguration by means of application's properties.

For example,

# application.yml
...
management:
    metrics:
        mf:
            collect:
                system: true
...

In this case
org.springframework.boot.actuate.autoconfigure.metrics.JvmMetricsAutoConfiguration
and
org.springframework.boot.actuate.autoconfigure.metrics.SystemMetricsAutoConfiguration
should be excluded from autoconfiguration as they collect similar metrics.

Scale histogram: prevent overflow when calculating TOTAL_SUM and other measurables

// 1) Create registry
var registry = new DefaultMetricRegistry();

// 2) Define labels
var service = new Label("service");
var server = new Label("server");

// 3) Register metric
Histogram histogram = registry.histogram(withName("failover", "count", "histogram"), () -> withHistogram()
    .description("Failover count histogram")
    .labels(service, server)
    .measurables(TOTAL_SUM, MEAN)
    .impl(scale().with(linearScale().from(0).steps(1, 2).withInf())));

// 4) Update metric
histogram.update(4, forLabelValues(service.value("service-1"), server.value("server-1-1")));
histogram.update(5, forLabelValues(service.value("service-1"), server.value("server-1-1")));

// Metric instances are added asynchronously
sleep(100);

// 5) Create exporter
PrometheusMetricsExporter exporter = new PrometheusMetricsExporter(registry);

// 6) Export metrics
System.out.println(exporter.exportMetrics());

Output:

# HELP failover_count_histogram Failover count histogram
# TYPE failover_count_histogram summary
failover_count_histogram_sum{service="service-1",server="server-1-1",} -2.0
# HELP failover_count_histogram_mean Failover count histogram
# TYPE failover_count_histogram_mean gauge
failover_count_histogram_mean{service="service-1",server="server-1-1",} -1.0

Rename MetricDimension/MetricDimensionValue to Label/LabelValue

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

ringcentral / metrics-facade Goto Github PK

metrics-facade's Introduction

RingCentral Metrics Facade

Table of Contents

Main Features Overview

Extremely flexible configuration

Advanced support for labeled metrics

Getting Started

Features

Flexible Configuration

Labeled metrics

Prefix Label Values

Exclusions

Slices and Levels

Metrics

Counter

Counter Config

Rate

Rate Config

Histogram

Histogram Config

Timer

Timer Config

Var and CachingVar

Metrics Reporters

PrometheusMetricsExporter

ZabbixMetricsJsonExporter and ZabbixLldMetricsReporter

TelegrafMetricsJsonExporter

JmxMetricsReporter

Metrics Producers

SystemMetricsProducer

RuntimeMetricsProducer

OperatingSystemMetricsProducer

GarbageCollectorsMetricsProducer

MemoryMetricsProducer

ThreadsMetricsProducer

BufferPoolsMetricsProducer

ClassesMetricsProducer

Integration

Dropwizard MetricRegistry integration

Export using DropwizardMetricRegistryDefaultInstanceSamplesProvider

Export using DropwizardMetricRegistryPrometheusInstanceSamplesProvider

License

metrics-facade's People

Contributors

Stargazers

Watchers

Forkers

metrics-facade's Issues

Expected

Actual

Recommend Projects

Recommend Topics

Recommend Org