tokio-rs / tokio-metrics Goto Github PK

View Code? Open in Web Editor NEW

282.0 282.0 23.0 153 KB

Utilities for collecting metrics from a Tokio application

License: MIT License

Rust 100.00%

tokio-metrics's People

Contributors

Stargazers

Watchers

tokio-metrics's Issues

Emit task metrics for single invocations instead of interval samples

Hello,

This is a feature request for some way to get the TaskMetrics for the invocation of a single future. Something like:

let monitor = tokio_metrics::TaskMonitor::new();

let (metrics, other_return_value) = monitor.instrument_single(some_future()).await;

The API usage above is not intended to be the actual API, just illustrating the idea. I want this feature is so that I can record metrics for the overhead every single execution of the some_future() future.

The ultimate reason is that I'm trying to write a program that measures the latency of remote service calls, and I want to understand what kind of overhead I'm seeing as a result of using an async runtime, as opposed to a simple blocking thread application. I'd like to see this on a per-request basis so that I can confirm that requests with high latency are only the result of the remote system, not a result of a delay in scheduling the task.

Is it worth tracking/exposing `num_scheduled`?

I think num_scheduled is going to equal num_polls - num_tasks? Need to double-check this, but if so, it doesn't need to be a field in the Metrics struct; it could be computed in a method, instead.

Should it even be exposed? @carllerche points out that this metric matters much more at the runtime level, since there are multiple ways tasks may be scheduled. For task metrics, what matters more is time spent scheduled. At least internally, we need to account for num_scheduled so we can compute mean_time_scheduled, but maybe num_scheduled doesn't actually need to be exposed.

Metric integrity in long-running applications.

Is storing durations as u64 nanoseconds enough? It’s 584 years, but if you have 5000 tasks, you’ll burn through it in 42 days of uptime. That sounds doable. At minimum, we should make sure it doesn’t panic on overflow/underflow.

Compatibility with Prometheus and pull-based approach in general

Hi there!

I'd like to start exposing tokio runtime metrics as part of my application's prometheus metrics. Unfortunately, there is a number of conceptual differences, which make tokio-metrics not really suitable for this.

Prometheus usually scrapes applications' metrics by calling an HTTP endpoint in equal time intervals. In my practice I've encountered scrape intervals between 15 secs and 5 minutes, it is determined by a trade-off in resolution requirements and available storage resources. In any case, all metric changes between two scrapes are not observable via Prometheus, usually the best practice for that is to implement most metrics as non-decreasing counters and derive frequency properties from that.

Also, since each metric scrape is a network interaction, it can be failed and retried without guarantees that the request really made it through to the process or not. Due to that it's important for a metrics endpoint to be stateless, which is violated in how intervals iterator is implemented. Ideally, there would be no state change at all when retrieving the current state of metrics.

Do you think that tokio-metrics is a good place to implement that kind of stuff or do you believe it targets a different type of metrics here?

Crisper examples of runtime metrics.

For each task metric, it's fairly easy to write a crisp, self-contained example that reliably induces a change in a metric. For runtime metrics, it's currently not so easy to do this, because:

runtime metrics are buffered
some runtime metrics are dependent on scheduling pathologies that are finicky to induce

We could resolve the first obstacle to provide some mechanism to flush metrics on-demand. For the second obstacle, I'm not sure there's much we can do.

~/github.com/tokio-metrics:main@e66d2ff$ RUSTFLAGS="--cfg tokio_unstable" cargo test --all-features
    Finished test [unoptimized + debuginfo] target(s) in 0.15s
     Running unittests (target/debug/deps/tokio_metrics-ec134d5a58bb3238)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests tokio-metrics

running 40 tests
test src/task.rs - task::TaskMetrics::first_poll_count (line 607) ... ok
test src/task.rs - task::TaskMetrics::instrumented_count (line 538) ... ok
test src/task.rs - task::TaskMetrics::mean_poll_duration (line 2001) ... ok
test src/task.rs - task::TaskMetrics::dropped_count (line 568) ... ok
test src/task.rs - task::TaskMetrics::total_fast_poll_count (line 1089) ... ok
test src/task.rs - task::TaskMetrics::mean_slow_poll_duration (line 2222) ... ok
test src/task.rs - task::TaskMetrics::mean_fast_poll_duration (line 2131) ... ok
test src/task.rs - task::TaskMetrics::slow_poll_ratio (line 2046) ... ok
test src/task.rs - task::TaskMetrics::mean_idle_duration (line 1881) ... ok
test src/task.rs - task::TaskMetrics::total_fast_poll_duration (line 1144) ... ok
test src/task.rs - task::TaskMetrics::total_first_poll_delay (line 648) ... ok
test src/task.rs - task::TaskMetrics::total_first_poll_delay (line 697) ... ok
test src/task.rs - task::TaskMetrics::total_first_poll_delay (line 731) ... FAILED
test src/task.rs - task::TaskMetrics::total_idle_duration (line 811) ... ok
test src/task.rs - task::TaskMetrics::total_idled_count (line 770) ... ok
test src/task.rs - task::TaskMonitor (line 306) ... ignored
test src/task.rs - task::TaskMonitor (line 321) ... ignored
test src/task.rs - task::TaskMetrics::total_poll_count (line 989) ... ok
test src/task.rs - task::TaskMetrics::total_poll_duration (line 1054) ... ok
test src/task.rs - task::TaskMetrics::total_scheduled_count (line 850) ... ok
test src/task.rs - task::TaskMetrics::mean_first_poll_delay (line 1811) ... ok
test src/task.rs - task::TaskMetrics::total_slow_poll_count (line 1211) ... ok
test src/task.rs - task::TaskMetrics::total_slow_poll_duration (line 1269) ... ok
test src/task.rs - task::TaskMonitor (line 71) - compile ... ok
test src/task.rs - task::TaskMonitor (line 362) ... FAILED
test src/task.rs - task::TaskMonitor (line 388) ... FAILED
test src/task.rs - task::TaskMonitor (line 413) ... ok
test src/lib.rs - (line 12) ... ok
test src/task.rs - task::TaskMonitor::cumulative (line 1571) ... ok
test src/task.rs - task::TaskMonitor (line 452) ... ok
test src/task.rs - task::TaskMonitor::instrument (line 1488) ... ok
test src/task.rs - task::TaskMonitor::instrument (line 1510) ... ok
test src/task.rs - task::TaskMonitor::instrument (line 1530) ... ok
test src/task.rs - task::TaskMonitor (line 281) ... ok
test src/task.rs - task::TaskMetrics::total_scheduled_duration (line 920) ... ok
test src/task.rs - task::TaskMonitor::intervals (line 1632) ... ok
test src/task.rs - task::TaskMonitor::slow_poll_threshold (line 1467) ... ok
test src/task.rs - task::TaskMonitor::with_slow_poll_threshold (line 1406) ... ok
test src/task.rs - task::TaskMetrics::mean_scheduled_duration (line 1920) ... ok
test src/task.rs - task::TaskMonitor (line 24) ... ok

failures:

---- src/task.rs - task::TaskMetrics::total_first_poll_delay (line 731) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'overflow when adding duration to instant', library/std/src/time.rs:409:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


---- src/task.rs - task::TaskMonitor (line 362) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'overflow when adding duration to instant', library/std/src/time.rs:409:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


---- src/task.rs - task::TaskMonitor (line 388) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'overflow when adding duration to instant', library/std/src/time.rs:409:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace



failures:
    src/task.rs - task::TaskMetrics::total_first_poll_delay (line 731)
    src/task.rs - task::TaskMonitor (line 362)
    src/task.rs - task::TaskMonitor (line 388)

test result: FAILED. 35 passed; 3 failed; 2 ignored; 0 measured; 0 filtered out; finished in 7.52s

error: test failed, to rerun pass '--doc'

This is a Mac OSX environment.

Remove default-features = false from readme

The readme should stick to the default imo.

Fix based on changes to yield_now

Due to tokio-rs/tokio#5223, some metrics tests were broken. These need to be fixed.

failures:

---- src/task.rs - task::TaskMetrics::mean_scheduled_duration (line 1924) stdout ----
Test executable failed (exit status: 101).

stderr:
thread 'main' panicked at 'assertion failed: interval.mean_scheduled_duration() >= Duration::from_secs(1)', src/task.rs:34:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:111:5
   3: rust_out::main::{{closure}}
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
   5: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
   6: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
   7: tokio::runtime::scheduler::current_thread::Context::enter
   8: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
   9: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
  10: tokio::macros::scoped_tls::ScopedKey<T>::set
  11: tokio::runtime::scheduler::current_thread::CoreGuard::enter
  12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
  13: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
  14: tokio::runtime::runtime::Runtime::block_on
  15: rust_out::main
  16: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


---- src/task.rs - task::TaskMetrics::total_scheduled_duration (line 922) stdout ----
Test executable failed (exit status: 101).

stderr:
thread 'main' panicked at 'assertion failed: total_scheduled_duration >= Duration::from_millis(1000)', src/task.rs:30:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:111:5
   3: rust_out::main::{{closure}}
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
   5: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
   6: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
   7: tokio::runtime::scheduler::current_thread::Context::enter
   8: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
   9: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
  10: tokio::macros::scoped_tls::ScopedKey<T>::set
  11: tokio::runtime::scheduler::current_thread::CoreGuard::enter
  12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
  13: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
  14: tokio::runtime::runtime::Runtime::block_on
  15: rust_out::main
  16: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.



failures:
    src/task.rs - task::TaskMetrics::mean_scheduled_duration (line 1924)
    src/task.rs - task::TaskMetrics::total_scheduled_duration (line 922)

test result: FAILED. 56 passed; 2 failed; 2 ignored; 0 measured; 0 filtered out; finished in 35.93s

0.1 TODOs

clean up Cargo.toml, features
- should time be an optional feature?
proofread documentation
decide what the runtime metrics MVP is and fill the gaps
set up CI
update README
blog post

impl Debug for public types

It would be helpful to add Debug impl for all public types, like TaskMonitor.

compatibility with tokio

I want to print metrics of Tokio example with master HEAD, then I get below error:


error[E0308]: mismatched types
    --> examples/tinyhttp.rs:40:51
     |
40   |         let runtime_monitor = RuntimeMonitor::new(&handle);
     |                               ------------------- ^^^^^^^ expected struct `tokio::runtime::handle::Handle`, found struct `Handle`
     |                               |
     |                               arguments to this function are incorrect
     |
     = note: expected reference `&tokio::runtime::handle::Handle`
                found reference `&Handle`
     = note: perhaps two different versions of crate `tokio` are being used?
note: associated function defined here
    --> /root/github/tokio-metrics/src/runtime.rs:1015:12
     |
1015 |     pub fn new(runtime: &runtime::Handle) -> RuntimeMonitor {
     |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `examples` due to previous error

Full change in Tokio:

diff --git a/.cargo/config b/.cargo/config
index df885898..71097e3c 100644
--- a/.cargo/config
+++ b/.cargo/config
@@ -1,2 +1,5 @@
+[build]
+rustflags = ["--cfg", "tokio_unstable"]
+rustdocflags = ["--cfg", "tokio_unstable"]
 # [build]
-# rustflags = ["--cfg", "tokio_unstable"]
\ No newline at end of file
+# rustflags = ["--cfg", "tokio_unstable"]
diff --git a/examples/Cargo.toml b/examples/Cargo.toml
index b35c587b..e628ceb2 100644
--- a/examples/Cargo.toml
+++ b/examples/Cargo.toml
@@ -10,7 +10,7 @@ edition = "2018"
 tokio = { version = "1.0.0", path = "../tokio", features = ["full", "tracing"] }
 tokio-util = { version = "0.7.0", path = "../tokio-util", features = ["full"] }
 tokio-stream = { version = "0.1", path = "../tokio-stream" }
-
+tokio-metrics = { version = "0.1.0", path = "../../tokio-metrics" }
 tracing = "0.1"
 tracing-subscriber = { version = "0.3.1", default-features = false, features = ["fmt", "ansi", "env-filter", "tracing-log"] }
 bytes = "1.0.0"
@@ -24,6 +24,9 @@ httpdate = "1.0"
 once_cell = "1.5.2"
 rand = "0.8.3"

+
+
+
 [target.'cfg(windows)'.dev-dependencies.windows-sys]
 version = "0.42.0"

diff --git a/examples/tinyhttp.rs b/examples/tinyhttp.rs
index fa0bc669..0457406a 100644
--- a/examples/tinyhttp.rs
+++ b/examples/tinyhttp.rs
@@ -18,8 +18,10 @@ use futures::SinkExt;
 use http::{header::HeaderValue, Request, Response, StatusCode};
 #[macro_use]
 extern crate serde_derive;
+use std::time::Duration;
 use std::{env, error::Error, fmt, io};
 use tokio::net::{TcpListener, TcpStream};
+use tokio_metrics::RuntimeMonitor;
 use tokio_stream::StreamExt;
 use tokio_util::codec::{Decoder, Encoder, Framed};

@@ -33,6 +35,18 @@ async fn main() -> Result<(), Box<dyn Error>> {
     let server = TcpListener::bind(&addr).await?;
     println!("Listening on: {}", addr);

+    let handle = tokio::runtime::Handle::current();
+    {
+        let runtime_monitor = RuntimeMonitor::new(&handle);
+        tokio::spawn(async move {
+            for interval in runtime_monitor.intervals() {
+                // pretty-print the metric interval
+                println!("{:?}", interval);
+                // wait 500ms
+                tokio::time::sleep(Duration::from_secs(1)).await;
+            }
+        });
+    }
     loop {
         let (stream, _) = server.accept().await?;
         tokio::spawn(async move {

Command:

RUSTFLAGS="--cfg tokio_unstable" cargo run --example tinyhttp

tokio-rs / tokio-metrics Goto Github PK

tokio-metrics's People

Contributors

Stargazers

Watchers

Forkers

tokio-metrics's Issues

Recommend Projects

Recommend Topics

Recommend Org