Coder Social home page Coder Social logo

Comments (9)

audax avatar audax commented on June 3, 2024 2

Yes, we do and yes, we will change it someday to let them aggregate their stats into one vassal per host.

from django-prometheus.

analytik avatar analytik commented on June 3, 2024 2

OK, maybe this will help someone, here's what I did:

  • Converted a simple uWSGI Django application into a uWSGI Emperor with 2 apps: Django and Metrics
  • Added Django middleware that increments local Redis values (since 1 uWSGI Emperor = 1 docker container) where there was Counter in django-prometheus, and pushed onto a Redis list where there was a Histogram.
  • metrics app loads all these Redis keys on load, instead of Counter, there is Gauge used, so we can set it to whatever is in Redis. For Histogram, there's just a loop of observe() for every item popped from the Redis list.
  • Django app exposes uWSGI stats too, on local port 1717.
  • Metrics (a tiny Falcon app) scrapes those, and converts them to Prometheus metrics, added together with the Django metrics from Redis.

I'll be happy to provide more details if it would help someone, it's just the code isn't tidied up.

from django-prometheus.

analytik avatar analytik commented on June 3, 2024 2

@ge-fa - sure.

First, we run uWSGI with uwsgi --enable-threads --emperor /foo/bar/emperor/$ENV --disable-logging - we keep slightly different configurations for dev vs stage vs prod.

In each emperor/env folder, we keep two ini files - one for the app itself:

[uwsgi]
chdir           = /foo/bar
module          = wsgi
pidfile         = /tmp/uwsgi.pid
master          = true
http-socket     = 0.0.0.0:80
vacuum          = true
enable-threads  = true
processes       = 2
lazy            = false
threads         = 4
post-buffering  = true
harakiri        = 30
max-requests    = 5000
buffer-size     = 65535
stats           = 127.0.0.1:1717
stats-http      = true

and one for metrics service:

[uwsgi]
chdir           = /foo/bar
module          = metrics:api
pidfile         = /tmp/uwsgi-metrics.pid
http-socket     = 0.0.0.0:9090
vacuum          = true
enable-threads  = true
threads         = 1
processes       = 3
post-buffering  = true
harakiri        = 10
max-requests    = 10
buffer-size     = 65535
disable-logging = true

These can be adjusted of course, but do not turn on lazy mode! The app will start leaking memory horribly. Now you serve on 3 ports - 80 for Django, 1717 for uWSGI metrics, and 9090 for Prometheus.

Now metrics.py should contain a simple app with something like this - in this case:

import falcon
from prometheus_client import generate_latest
from prometheus_client.core import REGISTRY

from your_app.metrics import metrics as custom_metrics # this is just a dictionary with optional business or other app metrics, can be empty
from prometheus_django_redis import metrics as django_metrics
from prometheus_django_utils import process_redis_stuff, startup_prometheus

class MetricsResource(object):
    def on_get(self, req, resp):
        process_redis_stuff(django_metrics)
        process_redis_stuff(custom_metrics)
        resp.content_type = 'text/plain'
        resp.status = falcon.HTTP_200
        resp.body = generate_latest(REGISTRY)

api = startup_prometheus(MetricsResource, HealthzResource) # I omitted HealthzResource here

Now, the functionality in prometheus_django_redis is a bit hacky. I'm not sure if I can share the whole code, but the gist of it is this:

import time
from pickle import dumps

import redis
from prometheus_client import Gauge, Histogram

r = redis.Redis()

metrics = {
    'requests_total': Gauge(
        'django_http_requests_before_middlewares_total',
        'Total count of requests before middlewares run.'),
# many others
}

def get_time():
    return time.time()


def time_since(t):
    return get_time() - t


def incr_with_labels(metric, labels, amount=1):
    r.hincrby(metric, dumps(labels), amount)

# and then the middleware itself
class PrometheusBeforeMiddleware(object):
    """Monitoring middleware that should run before other middlewares."""

    def process_request(self, request):
        r.incr('requests_total')
        request.prometheus_before_middleware_event = get_time()

    def process_response(self, request, response):
        r.incr('responses_total')
        if hasattr(request, 'prometheus_before_middleware_event'):
            r.rpush('requests_latency_before', time_since(request.prometheus_before_middleware_event))
        else:
            r.incr('requests_unknown_latency_before')
        return response

And then the rules for writing to Redis instead of directly to prometheus are as follows:

  • r.incr for Gauge
  • r.hincrby for Gauge with labels
  • r.rpush for Histogram

To read them, have some utility file, like


import logging
import traceback
from collections import defaultdict
from pickle import loads

import falcon
import redis
import requests_unixsocket
from prometheus_client.core import GaugeMetricFamily, REGISTRY


r = redis.Redis()
session = requests_unixsocket.Session()
PREFIX = "uwsgi"
EXCLUDE_FIELDS = {"pid", "uid", "cwd", "vars"}
LABEL_VALUE_FIELDS = {"id", "name"}


def object_to_prometheus(prefix, stats_dict, labels, label_name=None):
    label_value = next((stats_dict[field] for field in LABEL_VALUE_FIELDS if field in stats_dict), None)
    if label_name is not None and label_value is not None:
        label_name = label_name.rstrip("s")
        labels = labels + [(label_name, str(label_value))]

    for name, value in stats_dict.items():
        name = name.replace(" ", "_")
        if name.isupper() or name in EXCLUDE_FIELDS:
            # If isupper - it is request vars. No need to save it.
            continue
        if isinstance(value, list):
            yield from list_to_prometheus("{}_{}".format(prefix, name), value, labels, name)
        elif name not in LABEL_VALUE_FIELDS and isinstance(value, (int, float)):
            yield "{}_{}".format(prefix, name), sorted(labels), value


def list_to_prometheus(prefix, stats_list, labels, label_name):
    for stats in stats_list:
        yield from object_to_prometheus(prefix, stats, labels, label_name)


def build_prometheus_stats(stats_addr):
    uwsgi_stats = get_stats(stats_addr)
    stats = object_to_prometheus(PREFIX, uwsgi_stats, [])
    grouped_stats = defaultdict(list)
    # Need to group all values by name, otherwise prometheus do not accept it
    for metric_name, labels, value in stats:
        grouped_stats[metric_name].append((labels, value))
    for metric_name, stats in grouped_stats.items():
        label_names = [name for name, _ in stats[0][0]]
        g = GaugeMetricFamily(metric_name, "", labels=label_names)
        for labels, value in stats:
            g.add_metric([value for _, value in labels], value)
        yield g


def get_stats_collector(stats_getter):
    class StatsCollector:
        def collect(self):
            yield from stats_getter()
    return StatsCollector()


def get_stats(stats_addr):
    resp = session.get(stats_addr)
    resp.raise_for_status()
    return resp.json()


def handle_error(e, req, resp, params):
    logging.error(traceback.format_exc())
    try:
        raise e
    except falcon.HTTPError:
        raise e
    except Exception:
        raise falcon.HTTPInternalServerError('Internal Server Error', str(e))


class PongResource(object):
    def on_get(self, req, resp):
        resp.status = falcon.HTTP_200
        resp.content_type = 'text/plain'
        resp.body = 'PONG'



def startup_prometheus(MetricsResource, HealthzResource,
                       stats_address="http://127.0.0.1:1717"):
    REGISTRY.register(get_stats_collector(lambda: build_prometheus_stats(stats_address)))
    api = falcon.API()
    api.add_error_handler(Exception, handler=handle_error)
    api.add_route('/metrics', MetricsResource())
    api.add_route('/healthz/ping', PongResource())
    api.add_route('/healthz/', HealthzResource())
    return api



def process_redis_stuff(metrics):
    """ Read metrics saved by several processes/threads in Redis, and turn them into Prometheus metrics

    if type is Gauge, read and set
    if Gauge with labels, hgetall and set
    if Histogram, read and empty the list, observe values one by one
    """
    for (metric_name, metric) in metrics.items():
        metric_type = type(metric).__name__
        # logging.debug('Investigating metric %s typed %s' % (metric_name, metric_type))
        if metric_type == 'Gauge':
            value = r.get(metric_name) or 0
            # logging.debug('Setting %s to %s' % (metric_name, value))
            metric.set(value)
        elif metric_type == '_LabelWrapper':
            # for simplicity, assume all labeled classes are Gauge - to change, check _wrappedClass
            labels_and_values = r.hgetall(metric_name)
            for (labels, value) in labels_and_values.items():
                value = float(value)
                clean_labels = {}
                for (lab, val) in loads(labels).items():
                    lab = type(lab) == bytes and lab.decode('utf-8') or lab
                    val = type(val) == bytes and val.decode('utf-8') or val
                    clean_labels[lab] = val
                # logging.debug('Setting %s to %s with labels %s' % (metric_name, value, clean_labels))
                metric.labels(clean_labels).set(value)
        elif metric_type == 'Histogram':
            # get all values in the list (Array)
            values = r.lrange(metric_name, 0, -1)
            # cut those values out from Redis
            r.ltrim(metric_name, len(values), -1)
            # logging.debug('Observing %s values for %s' % (len(values), metric_name))
            for val in values:
                metric.observe(float(val))

See? Simple!

Except... not at all. I mean, I'm sure there are better ways to do it, but I did whatever butchered way was easy enough to develop and deliver.

In other news, I am incredibly happy to develop in Node.js, where asynchronous programming is a breeze, I can start infinite number of http servers in a few lines, and don't need nasty multi-threading / multiprocessing that eats gigabytes of memory to achieve all that. (That said, of course Python has its uses, but I no longer feel like http servers should be one of them, at least not unless you do something special like stackless/httptools/uvloop.)

Hope it helps!

EDIT: I should also note that we run each instance as a Docker container / Kubernetes pod, so there isn't any problem with allocating the same ports for many different applications. The Redis also runs locally to the pod, started simply with redis & which I know is barbaric, but so far has worked reliably.

from django-prometheus.

korfuri avatar korfuri commented on June 3, 2024 1

For WSGI (uwsgi, gunicorn, others too) exports, exporting via the urls.py doesn't work. It may work if you use a patched version of the prometheus client (see the multiproc branch: https://github.com/prometheus/client_python/tree/multiproc).

The simplest way is to configure the port-based exporter, so each worker process will export on a different port. See https://github.com/korfuri/django-prometheus/blob/master/documentation/exports.md#exporting-metrics-in-a-wsgi-application-with-multiple-processes on how to do that. If you have N workers you should have at least N ports in the range (you can have more). Then you need to configure each worker as a separate target in Prometheus (http://prometheus.io/docs/operating/configuration/#<target_group>), and use the rules language (http://prometheus.io/docs/querying/rules/) to aggregate data from multiple workers together.

from django-prometheus.

audax avatar audax commented on June 3, 2024

I've got the same problem. If I run the application in uwsgi in a single process, the /metrics export vial urls.py works just fine. If I have more than one process, I get the usual problem of only every hitting one worker.

If I run the exporter in an extra thread, it just doesn't collect any metrics. The "solution" for now is to run multiple uwsgi processes with each a single django worker. That is not nice.

from django-prometheus.

audax avatar audax commented on June 3, 2024

This is my uwsgi config which finally works:

[uwsgi]
processes = 4
master = true

# needed so that the processes are forked _before_ the exporter starts
lazy = true 
# enable-threads as mentioned in the docs
enable-threads=true

# and the usual fluff
die-on-term = true
plugins = python3
home = <home>
chdir =<dir>
module = <module>
env=DJANGO_SETTINGS_MODULE=<settings>
socket = /tmp/django-foobar-uwsgi.sock
vacuum = true

The missing piece in the docs was the 'lazy' option, which apparently fixes my problem. Without it, only 1 exporter was started which only exported nonsense.

from django-prometheus.

analytik avatar analytik commented on June 3, 2024

So, if I have uWSGI configured to use 8 processes and 8 threads, I need to expose metrics on 64 different ports? o_O

Even if I have to do this per process*thread, I would prefer to just use uWSGI cache, aggregate them in a separate uWSGI vassal, and expose them together on one address. So every Django thread would just need to do whatever is the python equivalent of something like

setTimeout(function() { set_cache(pid_and_thread_number, exported_stats) }, 10000);

Or am I going in a completely wrong direction? Are you people really scraping several ports per server to get accurate metrics?

from django-prometheus.

geor-g avatar geor-g commented on June 3, 2024

@analytik I'm really interested in this. Could you please share some more details / code? Thanks!

from django-prometheus.

asherf avatar asherf commented on June 3, 2024

Closed the issue due to inactivity. Feel free to reopen if needed

from django-prometheus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.