Coder Social home page Coder Social logo

bahmanm / lemmy-meter Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 0.0 698 KB

A web application to track Lemmy instances performance and represent the results visually

Home Page: https://lemmy-meter.info

License: GNU General Public License v3.0

Makefile 66.52% Jinja 1.48% Perl 31.48% Dockerfile 0.52%
fediverse lemmy observability

lemmy-meter's Introduction

1. lemmy-meter

A solution for Lemmy end-users, like me, to check the health of their favourite instance in 3 levels of details.

This is the source repository which is used to build and deploy lemmy-meter.info.

2. Health Reports

lemmy-meter provides 3 levels of reports.

2.1 Overall Health

This is what you are, almost always, interested in.

Colour Meaning Interpretation
๐ŸŸข Green none of the health checks are failing ๐Ÿ™‚ Your instance is healthy and doing well.
๐ŸŸ  Orange some of the health checks are failing ๐Ÿซค Your instance may be partially down; for example mobile APIs may not be working.
๐Ÿ”ด Red all health checks are failing ๐Ÿ™ Your instance may be completely down; for example during a planned maintenance.

2.2 Endpoint Health

A breakdown of overall health by few, subjectively, important endpoints:

  • Landing page: the web page users when they visit the instance.
  • Select API endpoints which are used by mobile (and desktop) applications:
    • getPosts
    • getComments
    • getCmmunities

2.3 Endpoint Response Time - Rate

  • A visual representation of how much the average response time has changed over time.
  • A flat line indicates a consistent response time, regardless of being slow or fast.
  • Spikes or changes in elevation mean changes in the response time.

NB: It does not represent the actual response times but only the fluctuations.

2.4 Endpoint Response Time - Raw

  • The raw response time per endpoint as it happend.
  • Lower is better. Anything below 500ms is quite decent.
  • Don't read too much into the actual values.
    The server is currently located in Germany which means non-EU instances will always be slightly slower than you'd expect.

3. How To Run

The only dependency is bmakelib.

3.1 Locally

Simply run make up and make down to start the cluster and tear it down.

You can access Grafana at http://localhost:3000 (admin/admin)

3.2 Remote

Run make deploy to, well, deploy lemmy-meter to the remote server.

lemmy-meter's People

Contributors

bahmanm avatar renovate[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lemmy-meter's Issues

Externally embeddable gauges

Investigate if it is possible to embed the health indicator gauges for a given instance in another website, the way a usual health "badge" works.


Thanks @unruffled for bringing this up.

Configure Alertmanager

Configure Prometheus alerts and Alertmanager to notify instance admins/communities of outages/degraded performance, eg in a Matrix channel/chat or a Discord server.

Retire matrix-webhook

With Prometheus alerts in place, there's no more need for the Grafana-Matrix bridge and it can be safely retired.

Integrate Alertmanager with ntfy

  • Configure ntfy to run as a component in the cluster.
  • Write a webhook receiver which translates Alertmanager payload to ntfy model.
  • Configure Alertmanager to use the said receiver.

Configure alerts for slow DNS resolution

There have been a couple of incidents already when Blackbox Exporter takes a very long time (10s+) to finish the "resolve" phase.

One suspect is the connection between the Docker daemon and provider's nameserver can become stale (:man_shrugging:) I patched the configuration to always use DNS servers outside the internal network.

However, I'd like to be alerted the next time this happens so I can start investigating right away.

Investigate alerts and notifications

Explore whether it is possible for viewers to sign up for notifications as to when their favourite instances becomes (partially) unavailable.

This may be potentially helpful for admins as well.

For this to happen:

  1. There should be an un/subscribe form.
  2. lemmy-meter should be able to able to send e-mails - probably plenty of them.
  3. Reasonable alerts should be configured.

Configure alerts

It should be possible to subscribe to a particular instance's alerts and receive a notification (eg an e-mail) whenever the alert is triggered.

Endpoint to validate scheduled downtime file

It would be helpful to implement an endpoint to assist admins in validating scheduled-downtime.json.

For example:

$ curl -X GET https://lemmy-meter.info/.metadata/validate-json?instance=<INSTANCE>
Invalid 
<detailed error message>

Try out Kamal instead of Compose

Kamal v1.0.0 which has just been released seems to be an interesting alternative to Docker Compose. It's worth trying it out while lemmy-meter is in its early stages.

Run matrix-webhook in the cluster

Currently, matrix-webhoo which is used for Alert notifications is run as a separate user. Move it to the same cluster as other services to ensure fail-over and consistency.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

cpanfile
cluster/downtime-processor/cpanfile
  • perl 5.39.9
  • Mojolicious 9.36
  • Net::Prometheus 0.12
  • Data::Dump 1.25
  • Schedule::Cron::Events 1.96
  • Text::CSV 2.04
  • Moose 2.2207
  • JSON 4.10
  • JSON::Validator 5.14
  • File::Slurper 0.014
  • Data::UUID 1.227
  • Log::Log4perl 1.57
docker-compose
cluster/docker-compose.yml
  • prom/prometheus v2.51.2
  • grafana/grafana 10.4.2
  • prom/blackbox-exporter v0.25.0
  • prometheuscommunity/json-exporter v0.6.0
  • nginx 1.26
  • postgres 16.2
  • prom/alertmanager v0.27.0
  • ixdotai/smtp v0.5.2
  • binwiederhier/ntfy v2.10.0
dockerfile
cluster/downtime-processor/Dockerfile
  • perl 5.39.9
pip_requirements
ansible/requirements.txt
  • ansible ==9.5.1
  • molecule ==6.0.3
  • molecule-plugins ==23.5.3
  • passlib == 1.7.4

  • Check this box to trigger a request for Renovate to run again on this repository

Import/export Grafana dashboards w/ zero downtime

It should be possible to transfer the changes between local lemmy-meter and lemmy-meter.info w/o requiring the cluster to be stopped.

One workflow is

  1. Grab latest dashboards from remote
  2. Experiment and make changes locally
  3. Upload the changes to remote

Or even better is to store the relevant Grafana configurations such data sources, users and dashboards so that they can be versioned in git.

Scrape downtime schedules off instances

Follow up on #22


It should be possible to scrape downtime schedules off predefined URLs from instances. For example, https://INSTANCE/.well-known/host-metadata.json or https://INSTANCE/.well-known/scheduled-downtime.json

Expose stats via APIs

It'd be useful to expose the health check results that lemmy-meter collects via some API to interested parties.

For example, uptime.lemmings.world could use such stats to generate uptime badges.


Things to note at the first pass:

  • The API shouldn't be public. Not at least for now, as lemmy-meter simply hasn't got the infrastructure for that.
  • There are two types of data that lemmy-meter ingests and stores: snapshot and time-series. Again, for the infrastructural reason, for the time being, the focus should be on the snapshot data.

Thanks @RikudouSage for bringing this up.

Automate the rollout of a new version

The current process for deploying a new version is quite laborious and involves scp, wget and unzip which is just not right ๐Ÿ˜…

Ideally, there should be an Ansible playbook(s) to automate all or most aspects of that:

  • Deploying a new version of lemmy-meter
  • Deploying Grafana dashboards
  • Restarting the cluster
  • Restarting a particular service

For the sake of simplicity, the task of deploying a cluster to a new machine can be skipped.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.