fimad / prometheus-haskell Goto Github PK

View Code? Open in Web Editor NEW

84.0 6.0 49.0 194 KB

Haskell client library for exposing prometheus.io metrics.

Haskell 99.28% Shell 0.72%

prometheus-haskell's People

Stargazers

Watchers

prometheus-haskell's Issues

Use Text instead of String

Text is generally much preferred to String, since it's not a linked list of things. Could I persuade you to OK a breaking change that updates these three projects to take & return Text?

I'd be happy to do the PR.

Registry implementation allows for multiple distinct registrations of the same metric

Currently, the [] implementation of Registry allows for multiple registrations of the same metric. I can't really think of a use-case where someone would want to have duplicate metrics (vs an idempotent register function). In fact, this behaviour being allowed can have negative consequences. Consider the following stripped down variant of Example/main.hs:

-- This example demonstrates how to use the prometheus-haskell libraries to
-- instrument a simple web app that allows users to vote for their favorite
-- color.
{-# LANGUAGE OverloadedStrings #-}
module Main
where

import           Network.HTTP.Types                (status200)
import           Network.HTTP.Types.Header         (hContentType)
import           Network.Wai.Handler.Warp          (run)

import qualified Data.ByteString.Lazy              as LBS
import qualified Network.Wai                       as Wai
import qualified Network.Wai.Middleware.Prometheus as P
import qualified Prometheus                        as P


{-# NOINLINE pageVisits #-}
pageVisits :: IO P.Counter
pageVisits = P.register
           $ P.counter
           -- Each metric provided by the base library takes an Info value that
           -- gives the name of the metric and a help string that describes the
           -- value that the metric represents.
           $ P.Info "page_visits" "The number of visits to the index page."

main :: IO ()
main = do
    let port = 3000
    putStrLn $ "Listening at http://localhost:" ++ show port ++ "/"
    -- Instrument the app with the prometheus middlware using the default
    -- `PrometheusSettings`. This will cause the app to dump the metrics when
    -- the /metrics endpoint is accessed.
    run port (P.prometheus P.def app)

app :: Wai.Application
app request respond = do
    response <- case Wai.pathInfo request of
        [] -> doIndex
        _  -> return $ mkResponse "404"
    respond response

mkResponse :: LBS.ByteString -> Wai.Response
mkResponse =  Wai.responseLBS status200 [(hContentType, "text/html")]

doIndex :: IO Wai.Response
doIndex = do
    pageVisitsMetric <- pageVisits
    P.incCounter pageVisitsMetric
    return $ mkResponse $ LBS.concat [
            "<a href='/metrics'>Metrics</a>"
        ,   "<br><br><br>"
        ]

Since I've made the unobvious 'mistake' of registering metrics in my handler, which is called per-request, the list of metrics grows every time my server handles a request:

# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
# TYPE http_request_duration_seconds histogram
... http stuff

One consequence of this is that a request to /metrics on the server may end up timing out because the metrics response size keeps on growing, so you lose visibility over the app until it restarts or the metrics are cleaned up. I think it would be valuable to make it difficult for consumers to do the 'wrong' thing inadvertently.

I can see two solutions to this, and I'm happy to work on a PR to implement either.

Change the underlying data structure of Registry to use a Set or a Map, so that register becomes idempotent. If desired, variants of register can be provided so that it is either idempotent or crashes when a metric is 're-registered'. I'm inclined to provide just the former to be honest.
Change the implementation of register to do an 'insert if not exists' sort of thing. This then incurs a cost every time register is called, and doesn't force other implementations of register to not provide this behaviour.

Thoughts?

Don't use error for validation

I started doing this as part of #7 but the scope creep & compatibility changes got a bit too much so I thought I'd file a ticket to get an opinion before I wasted too much time.

Currently, validation of metric names doesn't actually happen (although there is a checkInfo function) and validation of label names happens at vector construction, but causes the program to error out, which is not ideal in library code, because it robs the caller of the ability to make a decision about how to handle errors.

A good way of implementing this is to have dedicated newtypes that encode validity, e.g.

-- | A metric name guaranteed to be valid.
newtype MetricName = MetricName { formatMetricName :: Text } deriving (Read, Show, Eq, Ord, Monoid)

-- | Get a metric
toMetricName :: (MonadError MetricNameError m) => Text -> m MetricName
toMetricName name
    | Text.null name = throwError Empty
    | "__" `Text.isPrefixOf` name = throwError (StartsWithUnderscores name)
    | not (validStart (Text.head name)) = throwError (InvalidCharacters name)
    | Just _ <- Text.find (not . validRest) (Text.tail name) = throwError (InvalidCharacters name)
    | otherwise = return (MetricName name)
    where
        validStart c =  ('a' <= c && c <= 'z')
                     || ('A' <= c && c <= 'Z')
                     || c == '_'
                     || c == ':'

        validRest c =  validStart c || ('0' <= c && c <= '9')

-- | Possible errors when parsing metric names.
data MetricNameError
    = InvalidCharacters Text
    | StartsWithUnderscores Text
    | Empty
    deriving (Eq, Show)

-- | User-friendly error messages for invalid metric names.
formatMetricNameError :: MetricNameError -> Text
formatMetricNameError (InvalidCharacters name) = "The metric '" <> name <> "' contains invalid characters."
formatMetricNameError (StartsWithUnderscores name) = "The metric '" <> name <> "' cannot start with '__'."
formatMetricNameError Empty = "Empty metric names are not allowed."

Here, the MetricName type and the formatMetricName and toMetricName functions would be exported but not the MetricName constructor. This guarantees that any value with the MetricName type has a valid metric name.

(Apologies if this is obvious to you, but since roundtrips are moderately expensive I thought it better to over-communicate.)

Anyway, this works for metric names but not so well for label names because the existing type class more or less assumes that the names of labels have the same type as the values of labels.

I have a bunch of half-formed ideas about what to do about that, but pretty much everything would rather seriously break backwards compatibility.

Thoughts?

Add packages to Stackage LTS

Hello !

It looks like prometheus-client is available in the current Stackage LTS, however prometheus-metrics-ghc and wai-middleware-prometheus are not. It's be great to have all 3 since they're meant to be used together.

Adding a package looks simple enough, the Stackage team recommends maintainers to add their own packages :
https://github.com/commercialhaskell/stackage/blob/master/MAINTAINERS.md#adding-a-package

Would you mind taking a look ?

Histogram uses wrong names, causing breakage in Prometheus 2.0

Currently the names are:

x
x_count
x_sum

but they have to be

x_bucket
x_count
x_sum

(https://prometheus.io/docs/practices/histograms/)

In prometheus < 2.0 this was not a problem. The name x was implicitly upcasted to x_bucket.

In prometheus 2.0 this is not the case anymore. We should fix the naming

Don't use UTCTime for measuring time

UTCTime is used to record a point in time. If you want to record the passage of time, use clock is much more suitable.

Export process metrics

From https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors:

Process metrics

These exports should have the prefix process_. If a language or runtime
doesn't expose one of the variables it'd just not export it. All memory values
in bytes, all times in unixtime/seconds.

Metric name	Help string	Unit
`process_cpu_seconds_total`	Total user and system CPU time spent in seconds.	seconds
`process_open_fds`	Number of open file descriptors.	file descriptors
`process_max_fds`	Maximum number of open file descriptors.	file descriptors
`process_virtual_memory_bytes`	Virtual memory size in bytes.	bytes
`process_resident_memory_bytes`	Resident memory size in bytes.	bytes
`process_heap_bytes`	Process heap size in bytes.	bytes
`process_start_time_seconds`	Start time of the process since unix epoch in seconds.	seconds

It would be great to export these by default.

Push Gateway support

It would be nice to support Push Gateway, so you can push metrics to prometheus as well. This could be nice for e.g. CLI tasks that don't fit the model of scraping an endpoint.

Here's an example from the competing prometheus package: https://hackage.haskell.org/package/prometheus-2.2.2/docs/System-Metrics-Prometheus-Http-Push.html

prometheusInstrumentApp is never actually used

Weird one, but prometheusInstrumentApp is never actually used anywhere... This means that with PR #29, even if you set prometheusInstrumentApp to False, you'll still double instrument everything.

Export metrics as protobufs

Exporting as text is fine for now, but would welcome protobuf export, mostly for bragging rights about Haskell, but also because it should probably mean less CPU for everyone.

MonadMonitor vs MonadIO

At first glance, the MonadMonitor typeclass looks very similar to MonadIO, but maybe I'm missing something? Could MonadIO replace MonadMonitor? It would reduce the impedance mismatch with other libraries, whose monads often come with MonadIO instances, but not MonadMonitor, thus requiring an orphan instance or newtype wrapper.

prometheus-metrics-ghc-0.3.0 build failure with ghc 8.4: GCStats not in scope

As seen on the Stackage build server:

[1 of 1] Compiling Prometheus.Metric.GHC ( src/Prometheus/Metric/GHC.hs, dist/build/Prometheus/Metric/GHC.o )

src/Prometheus/Metric/GHC.hs:131:54: error:
    Not in scope: type constructor or class ‘GCStats’
    |
131 |                => String -> String -> SampleType -> (GCStats -> a) -> IO [SampleGroup]
    |                                                      ^^^^^^^

I was able to reproduce this locally like so:

stack unpack prometheus-metrics-ghc-0.3.0 && cd prometheus-metrics-ghc-0.3.0
edit stack.yaml # add the following stack.yaml
stack build --test --bench --no-run-benchmarks

# stack.yaml
resolver: nightly-2018-07-04
extra-deps:
- prometheus-client-0.3.0

Query endpoints?

Hi, I'm not too familiar with the prometheus ecosystem and have just successfully set up a grafana instance that gets data from my haskell-server via prometheus-node. Now additionally I wanted the ghc metrics, so I added that via wai-middleware-prometheus, but it appears the only endpoint this ever exposes is /metrics (which works). Grafana however uses the query API of prometheus when I add that as a datasource, but it seems to be completely missing, so the data source doesn't work.

FYI easy to trigger GHC bug with prometheus-client

I encountered https://ghc.haskell.org/trac/ghc/ticket/13751 while using Prometheus, in case others would run into the bug. In the mean time, with using RTS option -N1 or -N2 the bug didn't reproduce. It doesn't mean it's not there of course.

New release

Hi @fimad,

How are we looking on a new release? Anything I can do to help?

Summary metric lacks a decay window

I might be misunderstanding, but the Prometheus documentation talks about a sliding window or decay period for summary metrics; however the current implementation seems to aggregate over all time (since the metric was initialized), which seems to be less useful.

Histogram support

Prometheus now has histograms to complement summaries:

It'd be great to be able to export them from Haskell code.

Unused settings?

In PrometheusSettings I see prometheusInstrumentApp and prometheusInstrumentPrometheus but those flags seem to be never used.
I'd like to have /metrics endpoint which responds only with the metrics I specify but it looks like that out of the box prometheus from Network.Wai.Middleware.Prometheus always instruments both metrics and app (which in my case would be just returning 404 for every request).

Define Eq, Ord and NFData instances for Sample

The Sample data type does not have instances for Eq, Ord and NFData.

The former two type classes are generally useful can can be automatically derived by GHC.

The latter type class is primarily useful when writing tests and benchmarks. The entire definition of the instance fits on two lines:

instance NFData Sample where
  rnf (Sample name labelSet value) = rnf name `seq` rnf labelSet `seq` rnf value

Excessive GC usage for `Summary` metric in wai-prometheus-middleware

We have a WAI application in production that serves only around 5 reqs/s and we had serious performance issues as the server was GC'ing 95% of the time.

Before

You can see that we're spending almost all clock cycles on garbage collection, and garbage collection happens often

Profiler output

After profiling, the culprit turned out to be the summaries that are kept for each request in wai-prometheus-middleware

	Tue Jul  4 11:17 2017 Time and Allocation Profiling Report  (Final)

	   warpmachine +RTS -N -T -p -RTS --proxy-host localhost --proxy-port 8000

	total time  =       17.73 secs   (17725 ticks @ 1000 us, 1 processor)
	total alloc = 10,233,152,016 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE                          SRC                                                   %time %alloc

invariant.fj                          Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:(197,9)-(198,58)      17.9   13.1
wrapCompile.\                         Text.Regex.Posix.Wrap           Text/Regex/Posix/Wrap.hsc:(480,49)-(484,42)             7.9    0.0
throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal         Network/Socket/Internal.hsc:(161,1)-(162,53)            6.2    0.6
getAddrInfo.\.\.\.\                   Network.Socket                  Network/Socket.hsc:(1460,36)-(1470,43)                  5.4    0.0
connect.\.\.connectLoop               Network.Socket                  Network/Socket.hsc:(444,9)-(458,29)                     4.7    0.0
throwSocketErrorIfMinus1_             Network.Socket.Internal         Network/Socket/Internal.hsc:166:1-47                    3.0    0.0
invariant                             Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:(193,1)-(198,58)       2.4    6.0
wrapTest.\                            Text.Regex.Posix.Wrap           Text/Regex/Posix/Wrap.hsc:(489,45)-(495,40)             1.9    0.0
compress.compressPair.inv             Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:172:17-60              1.8    5.1
connect.\.\.connectBlocked            Network.Socket                  Network/Socket.hsc:(460,9)-(465,67)                     1.6    0.4
compress                              Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:(156,1)-(172,60)       1.3    3.9
decodeWithTable/fill                  Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(171,62)-(194,68)    1.1    2.3
encode/fill                           Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(61,55)-(69,52)      1.1    2.6
urlDecode                             Network.HTTP.Types.URI          Network/HTTP/Types/URI.hs:(209,1)-(228,34)              1.0    2.3
peek8_32                              Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:43:1-36              0.9    2.8
decodeWithTable.\.\.look              Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(165,11)-(168,49)    0.8    2.8
asCString                             Text.Regex.Posix.ByteString     Text/Regex/Posix/ByteString.hs:(150,1)-(152,40)         0.5    1.1
memConstEqual.loop                    Data.Memory.PtrMethods          Data/Memory/PtrMethods.hs:(103,5)-(107,34)              0.5    1.7
toByteString                          Blaze.ByteString.Builder        Blaze/ByteString/Builder.hs:151:1-46                    0.3    1.4
urlDecode.go                          Network.HTTP.Types.URI          Network/HTTP/Types/URI.hs:(211,5)-(221,40)              0.2    1.5
toByteStringIOWith.getBuffer          Blaze.ByteString.Builder        Blaze/ByteString/Builder.hs:(187,5)-(204,40)            0.2   10.8
toByteStringIOWith                    Blaze.ByteString.Builder        Blaze/ByteString/Builder.hs:(184,1)-(204,40)            0.2   10.7
recv                                  Network.Socket.ByteString       Network/Socket/ByteString.hsc:(226,1)-(231,62)          0.1    3.3

After disabling `wai-prometheus-middleware`

We commented out the middleware in our application, and instead of using 400% CPU, we're back to 2% CPU, and the GC contribution is only 0.02 instead of 1.0. Also see the graph below. On the left is before removing the middleware, and on the right is after removing the middleware.

Why there is no endpoint info in wai-middleware-prometheus metrics?

It seems to me that use of wai-middleware-prometheus is quite limited as usually people use multiple endpoints and measuring average value for all of them (while the nature of those endpoints could be very different) sounds to be at least odd.
Do I miss some way to get per endpoint metrics or I should create a PR adding extra label with endpoint information?

Fancier types for labels

It'd be nice to have a type class for labels such that I could define in my app how a certain domain-specific value got translated to a Prometheus label.

e.g.

class ToLabel a where
  toLabel :: a -> String

data MyStatus = Good | Bad | Indifferent

instance ToLabel MyStatus where
  toLabel Good = "good"
  toLabel Bad = "bad"
  toLabel Indifferent = "indifferent"

withLabel Good incCounter someMetric

But then, as I write this, I realise this probably devolves into wanting extensible records such that you can declare the type of a label at metric creation and have the type system make sure you've go the right value in the right place, which is probably going overboard.

Counter should be floating point

In Prometheus, everything is a 64 bit floating point. The documentation for https://prometheus.io/docs/concepts/metric_types/#counter doesn't say that it's restricted to integrals and thus should probably be a float.

Move to Prometheus organization / governance?

@fimad What do you think of opening up commit & release access?

We've all got busy day jobs, but perhaps together we could release more often and merge PRs.

One way to do this would be to move this project to the Prometheus project & governance model. This would probably mean:

the repo would be at prometheus/prometheus-haskell
the existing Prometheus team members and maintainers would get commit access
we could add maintainers specific to prometheus-haskell (I'd suggest myself & @ocharles)

I asked on IRC, and this isn't an a priori terrible idea, but it is something that would have to be proposed and voted on by the Prometheus team members.

Other options would be adding collaborators, or creating a new organization and moving there.

Let me know what you think. If you want to try moving into the Prometheus project, I can draft a proposal and try to get a team member to propose it.

Enbable configurable buckets for `http_request_duration_seconds`

The requestLatency function picks the default set of buckets, which very rigid...
We have two situations: either our servers are talking between each other in the same datacenter, which means requests are blazing fast and we see things like:

http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.005"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.01"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.025"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.05"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.1"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.25"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.5"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="1.0"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="2.5"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="5.0"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="10.0"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="+Inf"} 1860
http_request_duration_seconds_sum{handler="prometheus",method="GET",status_code="200"} 7.270726499999998e-2
http_request_duration_seconds_count{handler="prometheus",method="GET",status_code="200"} 1860

Which is useless... The other situation we face is dual: we're handling long running requests, and all our requests end up in the +Inf bucket. I'd love to be able to configure the buckets for requestLatency to get useful information from this data.

Does withLabel need to have the type it does?

The type of withLabel is:

withLabel :: (Label label, MonadMonitor m) => label -> (Metric metric -> IO ()) -> Metric (Vector label metric) -> m ()

This type implies that once the IO action that uses the labelled metric is terminated, some clean up must happen. Yet if we actually consult the source of withLabel, we have

withLabel label f (Metric {handle = MkVector ioref}) = doIO $ do
    (gen, _) <- IORef.readIORef ioref
    newMetric <- gen
    metric <- Atomics.atomicModifyIORefCAS ioref $ \(_, metricMap) ->
        let maybeMetric = Map.lookup label metricMap
            updatedMap  = Map.insert label newMetric metricMap
        in  case maybeMetric of
                Nothing     -> ((gen, updatedMap), newMetric)
                Just metric -> ((gen, metricMap), metric)
    f metric

Nothing happens after f is called - there is no cleanup that has to happen. Thus it would be simpler to have

withLabel :: (Label label, MonadMonitor m) => label -> Metric (Vector label metric) -> m (Metric metric)

My guess is that we have the type we have because we are forced into it by doIO. I find doIO a tad suspicious anyway, and think that MonadMonitor would likely be better if it actually had specific operations added, rather than a catch all liftIO-like operation.

What do you think?

Use RTSStats with base 4.9 and later

GCStats is deprecated as of base 4.10 (see https://www.stackage.org/haddock/lts-11.2/base-4.10.1.0/GHC-Stats.html) and has been replaced with RTSStats.

It would be good to support RTSStats when running later versions.

Is it possible to add labels to ghcMetrics? If yes, how?

Sorry if this is a silly question, but I found no obvious way to add labels to the metrics exposed by ghcMetrics. I plan to aggregate the exposed prometheus text of multiple service instance and so I'm looking for a way to distinguish the instances - I think prometheus labels might be a good way to do so. For sure labels could maybe added during scraping; but I think the running application has much more context information to add reasonable labels/values.

fimad / prometheus-haskell Goto Github PK

prometheus-haskell's People

Stargazers

Watchers

Forkers

prometheus-haskell's Issues

Process metrics

Before

Profiler output

After disabling wai-prometheus-middleware

Recommend Projects

Recommend Topics

Recommend Org

After disabling `wai-prometheus-middleware`