Coder Social home page Coder Social logo

prometheus-haskell's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

prometheus-haskell's Issues

Use Text instead of String

Text is generally much preferred to String, since it's not a linked list of things. Could I persuade you to OK a breaking change that updates these three projects to take & return Text?

I'd be happy to do the PR.

Registry implementation allows for multiple distinct registrations of the same metric

Currently, the [] implementation of Registry allows for multiple registrations of the same metric. I can't really think of a use-case where someone would want to have duplicate metrics (vs an idempotent register function). In fact, this behaviour being allowed can have negative consequences. Consider the following stripped down variant of Example/main.hs:

-- This example demonstrates how to use the prometheus-haskell libraries to
-- instrument a simple web app that allows users to vote for their favorite
-- color.
{-# LANGUAGE OverloadedStrings #-}
module Main
where

import           Network.HTTP.Types                (status200)
import           Network.HTTP.Types.Header         (hContentType)
import           Network.Wai.Handler.Warp          (run)

import qualified Data.ByteString.Lazy              as LBS
import qualified Network.Wai                       as Wai
import qualified Network.Wai.Middleware.Prometheus as P
import qualified Prometheus                        as P


{-# NOINLINE pageVisits #-}
pageVisits :: IO P.Counter
pageVisits = P.register
           $ P.counter
           -- Each metric provided by the base library takes an Info value that
           -- gives the name of the metric and a help string that describes the
           -- value that the metric represents.
           $ P.Info "page_visits" "The number of visits to the index page."

main :: IO ()
main = do
    let port = 3000
    putStrLn $ "Listening at http://localhost:" ++ show port ++ "/"
    -- Instrument the app with the prometheus middlware using the default
    -- `PrometheusSettings`. This will cause the app to dump the metrics when
    -- the /metrics endpoint is accessed.
    run port (P.prometheus P.def app)

app :: Wai.Application
app request respond = do
    response <- case Wai.pathInfo request of
        [] -> doIndex
        _  -> return $ mkResponse "404"
    respond response

mkResponse :: LBS.ByteString -> Wai.Response
mkResponse =  Wai.responseLBS status200 [(hContentType, "text/html")]

doIndex :: IO Wai.Response
doIndex = do
    pageVisitsMetric <- pageVisits
    P.incCounter pageVisitsMetric
    return $ mkResponse $ LBS.concat [
            "<a href='/metrics'>Metrics</a>"
        ,   "<br><br><br>"
        ]

Since I've made the unobvious 'mistake' of registering metrics in my handler, which is called per-request, the list of metrics grows every time my server handles a request:

# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
# TYPE http_request_duration_seconds histogram
... http stuff

One consequence of this is that a request to /metrics on the server may end up timing out because the metrics response size keeps on growing, so you lose visibility over the app until it restarts or the metrics are cleaned up. I think it would be valuable to make it difficult for consumers to do the 'wrong' thing inadvertently.

I can see two solutions to this, and I'm happy to work on a PR to implement either.

  1. Change the underlying data structure of Registry to use a Set or a Map, so that register becomes idempotent. If desired, variants of register can be provided so that it is either idempotent or crashes when a metric is 're-registered'. I'm inclined to provide just the former to be honest.
  2. Change the implementation of register to do an 'insert if not exists' sort of thing. This then incurs a cost every time register is called, and doesn't force other implementations of register to not provide this behaviour.

Thoughts?

Don't use error for validation

I started doing this as part of #7 but the scope creep & compatibility changes got a bit too much so I thought I'd file a ticket to get an opinion before I wasted too much time.

Currently, validation of metric names doesn't actually happen (although there is a checkInfo function) and validation of label names happens at vector construction, but causes the program to error out, which is not ideal in library code, because it robs the caller of the ability to make a decision about how to handle errors.

A good way of implementing this is to have dedicated newtypes that encode validity, e.g.

-- | A metric name guaranteed to be valid.
newtype MetricName = MetricName { formatMetricName :: Text } deriving (Read, Show, Eq, Ord, Monoid)

-- | Get a metric
toMetricName :: (MonadError MetricNameError m) => Text -> m MetricName
toMetricName name
    | Text.null name = throwError Empty
    | "__" `Text.isPrefixOf` name = throwError (StartsWithUnderscores name)
    | not (validStart (Text.head name)) = throwError (InvalidCharacters name)
    | Just _ <- Text.find (not . validRest) (Text.tail name) = throwError (InvalidCharacters name)
    | otherwise = return (MetricName name)
    where
        validStart c =  ('a' <= c && c <= 'z')
                     || ('A' <= c && c <= 'Z')
                     || c == '_'
                     || c == ':'

        validRest c =  validStart c || ('0' <= c && c <= '9')

-- | Possible errors when parsing metric names.
data MetricNameError
    = InvalidCharacters Text
    | StartsWithUnderscores Text
    | Empty
    deriving (Eq, Show)

-- | User-friendly error messages for invalid metric names.
formatMetricNameError :: MetricNameError -> Text
formatMetricNameError (InvalidCharacters name) = "The metric '" <> name <> "' contains invalid characters."
formatMetricNameError (StartsWithUnderscores name) = "The metric '" <> name <> "' cannot start with '__'."
formatMetricNameError Empty = "Empty metric names are not allowed."

Here, the MetricName type and the formatMetricName and toMetricName functions would be exported but not the MetricName constructor. This guarantees that any value with the MetricName type has a valid metric name.

(Apologies if this is obvious to you, but since roundtrips are moderately expensive I thought it better to over-communicate.)

Anyway, this works for metric names but not so well for label names because the existing type class more or less assumes that the names of labels have the same type as the values of labels.

I have a bunch of half-formed ideas about what to do about that, but pretty much everything would rather seriously break backwards compatibility.

Thoughts?

Export process metrics

From https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors:

Process metrics

These exports should have the prefix process_. If a language or runtime
doesn't expose one of the variables it'd just not export it. All memory values
in bytes, all times in unixtime/seconds.

Metric name Help string Unit
process_cpu_seconds_total Total user and system CPU time spent in seconds. seconds
process_open_fds Number of open file descriptors. file descriptors
process_max_fds Maximum number of open file descriptors. file descriptors
process_virtual_memory_bytes Virtual memory size in bytes. bytes
process_resident_memory_bytes Resident memory size in bytes. bytes
process_heap_bytes Process heap size in bytes. bytes
process_start_time_seconds Start time of the process since unix epoch in seconds. seconds

It would be great to export these by default.

Export metrics as protobufs

Exporting as text is fine for now, but would welcome protobuf export, mostly for bragging rights about Haskell, but also because it should probably mean less CPU for everyone.

MonadMonitor vs MonadIO

At first glance, the MonadMonitor typeclass looks very similar to MonadIO, but maybe I'm missing something? Could MonadIO replace MonadMonitor? It would reduce the impedance mismatch with other libraries, whose monads often come with MonadIO instances, but not MonadMonitor, thus requiring an orphan instance or newtype wrapper.

prometheus-metrics-ghc-0.3.0 build failure with ghc 8.4: GCStats not in scope

As seen on the Stackage build server:

[1 of 1] Compiling Prometheus.Metric.GHC ( src/Prometheus/Metric/GHC.hs, dist/build/Prometheus/Metric/GHC.o )

src/Prometheus/Metric/GHC.hs:131:54: error:
    Not in scope: type constructor or class ‘GCStats’
    |
131 |                => String -> String -> SampleType -> (GCStats -> a) -> IO [SampleGroup]
    |                                                      ^^^^^^^

I was able to reproduce this locally like so:

stack unpack prometheus-metrics-ghc-0.3.0 && cd prometheus-metrics-ghc-0.3.0
edit stack.yaml # add the following stack.yaml
stack build --test --bench --no-run-benchmarks
# stack.yaml
resolver: nightly-2018-07-04
extra-deps:
- prometheus-client-0.3.0

Query endpoints?

Hi, I'm not too familiar with the prometheus ecosystem and have just successfully set up a grafana instance that gets data from my haskell-server via prometheus-node. Now additionally I wanted the ghc metrics, so I added that via wai-middleware-prometheus, but it appears the only endpoint this ever exposes is /metrics (which works). Grafana however uses the query API of prometheus when I add that as a datasource, but it seems to be completely missing, so the data source doesn't work.

New release

Hi @fimad,

How are we looking on a new release? Anything I can do to help?

Summary metric lacks a decay window

I might be misunderstanding, but the Prometheus documentation talks about a sliding window or decay period for summary metrics; however the current implementation seems to aggregate over all time (since the metric was initialized), which seems to be less useful.

Unused settings?

In PrometheusSettings I see prometheusInstrumentApp and prometheusInstrumentPrometheus but those flags seem to be never used.
I'd like to have /metrics endpoint which responds only with the metrics I specify but it looks like that out of the box prometheus from Network.Wai.Middleware.Prometheus always instruments both metrics and app (which in my case would be just returning 404 for every request).

Define Eq, Ord and NFData instances for Sample

The Sample data type does not have instances for Eq, Ord and NFData.

The former two type classes are generally useful can can be automatically derived by GHC.

The latter type class is primarily useful when writing tests and benchmarks. The entire definition of the instance fits on two lines:

instance NFData Sample where
  rnf (Sample name labelSet value) = rnf name `seq` rnf labelSet `seq` rnf value

Excessive GC usage for `Summary` metric in wai-prometheus-middleware

We have a WAI application in production that serves only around 5 reqs/s and we had serious performance issues as the server was GC'ing 95% of the time.

Before

You can see that we're spending almost all clock cycles on garbage collection, and garbage collection happens often
lol

Profiler output

After profiling, the culprit turned out to be the summaries that are kept for each request in wai-prometheus-middleware

	Tue Jul  4 11:17 2017 Time and Allocation Profiling Report  (Final)

	   warpmachine +RTS -N -T -p -RTS --proxy-host localhost --proxy-port 8000

	total time  =       17.73 secs   (17725 ticks @ 1000 us, 1 processor)
	total alloc = 10,233,152,016 bytes  (excludes profiling overheads)

COST CENTRE                           MODULE                          SRC                                                   %time %alloc

invariant.fj                          Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:(197,9)-(198,58)      17.9   13.1
wrapCompile.\                         Text.Regex.Posix.Wrap           Text/Regex/Posix/Wrap.hsc:(480,49)-(484,42)             7.9    0.0
throwSocketErrorIfMinus1RetryMayBlock Network.Socket.Internal         Network/Socket/Internal.hsc:(161,1)-(162,53)            6.2    0.6
getAddrInfo.\.\.\.\                   Network.Socket                  Network/Socket.hsc:(1460,36)-(1470,43)                  5.4    0.0
connect.\.\.connectLoop               Network.Socket                  Network/Socket.hsc:(444,9)-(458,29)                     4.7    0.0
throwSocketErrorIfMinus1_             Network.Socket.Internal         Network/Socket/Internal.hsc:166:1-47                    3.0    0.0
invariant                             Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:(193,1)-(198,58)       2.4    6.0
wrapTest.\                            Text.Regex.Posix.Wrap           Text/Regex/Posix/Wrap.hsc:(489,45)-(495,40)             1.9    0.0
compress.compressPair.inv             Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:172:17-60              1.8    5.1
connect.\.\.connectBlocked            Network.Socket                  Network/Socket.hsc:(460,9)-(465,67)                     1.6    0.4
compress                              Prometheus.Metric.Summary       src/Prometheus/Metric/Summary.hs:(156,1)-(172,60)       1.3    3.9
decodeWithTable/fill                  Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(171,62)-(194,68)    1.1    2.3
encode/fill                           Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(61,55)-(69,52)      1.1    2.6
urlDecode                             Network.HTTP.Types.URI          Network/HTTP/Types/URI.hs:(209,1)-(228,34)              1.0    2.3
peek8_32                              Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:43:1-36              0.9    2.8
decodeWithTable.\.\.look              Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:(165,11)-(168,49)    0.8    2.8
asCString                             Text.Regex.Posix.ByteString     Text/Regex/Posix/ByteString.hs:(150,1)-(152,40)         0.5    1.1
memConstEqual.loop                    Data.Memory.PtrMethods          Data/Memory/PtrMethods.hs:(103,5)-(107,34)              0.5    1.7
toByteString                          Blaze.ByteString.Builder        Blaze/ByteString/Builder.hs:151:1-46                    0.3    1.4
urlDecode.go                          Network.HTTP.Types.URI          Network/HTTP/Types/URI.hs:(211,5)-(221,40)              0.2    1.5
toByteStringIOWith.getBuffer          Blaze.ByteString.Builder        Blaze/ByteString/Builder.hs:(187,5)-(204,40)            0.2   10.8
toByteStringIOWith                    Blaze.ByteString.Builder        Blaze/ByteString/Builder.hs:(184,1)-(204,40)            0.2   10.7
recv                                  Network.Socket.ByteString       Network/Socket/ByteString.hsc:(226,1)-(231,62)          0.1    3.3

After disabling wai-prometheus-middleware

We commented out the middleware in our application, and instead of using 400% CPU, we're back to 2% CPU, and the GC contribution is only 0.02 instead of 1.0. Also see the graph below. On the left is before removing the middleware, and on the right is after removing the middleware.

lol

Why there is no endpoint info in wai-middleware-prometheus metrics?

It seems to me that use of wai-middleware-prometheus is quite limited as usually people use multiple endpoints and measuring average value for all of them (while the nature of those endpoints could be very different) sounds to be at least odd.
Do I miss some way to get per endpoint metrics or I should create a PR adding extra label with endpoint information?

Fancier types for labels

It'd be nice to have a type class for labels such that I could define in my app how a certain domain-specific value got translated to a Prometheus label.

e.g.

class ToLabel a where
  toLabel :: a -> String

data MyStatus = Good | Bad | Indifferent

instance ToLabel MyStatus where
  toLabel Good = "good"
  toLabel Bad = "bad"
  toLabel Indifferent = "indifferent"

withLabel Good incCounter someMetric

But then, as I write this, I realise this probably devolves into wanting extensible records such that you can declare the type of a label at metric creation and have the type system make sure you've go the right value in the right place, which is probably going overboard.

Move to Prometheus organization / governance?

@fimad What do you think of opening up commit & release access?

We've all got busy day jobs, but perhaps together we could release more often and merge PRs.

One way to do this would be to move this project to the Prometheus project & governance model. This would probably mean:

  • the repo would be at prometheus/prometheus-haskell
  • the existing Prometheus team members and maintainers would get commit access
  • we could add maintainers specific to prometheus-haskell (I'd suggest myself & @ocharles)

I asked on IRC, and this isn't an a priori terrible idea, but it is something that would have to be proposed and voted on by the Prometheus team members.

Other options would be adding collaborators, or creating a new organization and moving there.

Let me know what you think. If you want to try moving into the Prometheus project, I can draft a proposal and try to get a team member to propose it.

Enbable configurable buckets for `http_request_duration_seconds`

The requestLatency function picks the default set of buckets, which very rigid...
We have two situations: either our servers are talking between each other in the same datacenter, which means requests are blazing fast and we see things like:

http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.005"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.01"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.025"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.05"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.1"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.25"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="0.5"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="1.0"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="2.5"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="5.0"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="10.0"} 1860
http_request_duration_seconds_bucket{handler="prometheus",method="GET",status_code="200",le="+Inf"} 1860
http_request_duration_seconds_sum{handler="prometheus",method="GET",status_code="200"} 7.270726499999998e-2
http_request_duration_seconds_count{handler="prometheus",method="GET",status_code="200"} 1860

Which is useless... The other situation we face is dual: we're handling long running requests, and all our requests end up in the +Inf bucket. I'd love to be able to configure the buckets for requestLatency to get useful information from this data.

Does withLabel need to have the type it does?

The type of withLabel is:

withLabel :: (Label label, MonadMonitor m) => label -> (Metric metric -> IO ()) -> Metric (Vector label metric) -> m ()

This type implies that once the IO action that uses the labelled metric is terminated, some clean up must happen. Yet if we actually consult the source of withLabel, we have

withLabel label f (Metric {handle = MkVector ioref}) = doIO $ do
    (gen, _) <- IORef.readIORef ioref
    newMetric <- gen
    metric <- Atomics.atomicModifyIORefCAS ioref $ \(_, metricMap) ->
        let maybeMetric = Map.lookup label metricMap
            updatedMap  = Map.insert label newMetric metricMap
        in  case maybeMetric of
                Nothing     -> ((gen, updatedMap), newMetric)
                Just metric -> ((gen, metricMap), metric)
    f metric

Nothing happens after f is called - there is no cleanup that has to happen. Thus it would be simpler to have

withLabel :: (Label label, MonadMonitor m) => label -> Metric (Vector label metric) -> m (Metric metric)

My guess is that we have the type we have because we are forced into it by doIO. I find doIO a tad suspicious anyway, and think that MonadMonitor would likely be better if it actually had specific operations added, rather than a catch all liftIO-like operation.

What do you think?

Is it possible to add labels to ghcMetrics? If yes, how?

Sorry if this is a silly question, but I found no obvious way to add labels to the metrics exposed by ghcMetrics. I plan to aggregate the exposed prometheus text of multiple service instance and so I'm looking for a way to distinguish the instances - I think prometheus labels might be a good way to do so. For sure labels could maybe added during scraping; but I think the running application has much more context information to add reasonable labels/values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.