Coder Social home page Coder Social logo

o11y-demo's Introduction

o11y-demo

This is a self contained demo of the Liatrio OpenTelemetry Collector and Grafana using Prometheus as the data source. The intent of this demo is to show how to use the Liatrio OpenTelemetry Collector to scrape data from GitHub and expose it to Grafana for visualization. This demo is not intended to be used in production.

There is also a bonus section that shows a local development workflow for making changes to the Liatrio OpenTelemetry Collector utilizing Grafana to visualize and Promethues as the data store and a locally built collector image.

Architecture

stateDiagram-v2
    Github --> Collector: Pushes
    Collector --> Github: Scrapes
    Collector --> Prometheus: Pushes
    Collector --> Loki: Pushes
    Grafana --> Prometheus: Pulls
    Grafana --> Loki: Pulls
    
Loading

Pre-requisites

  • The Liatrio OpenTelemetry Collector images are hosted on ghcr.io see here for instructions on how to login
  • The runtime Collector requires a GitHub Personal Access Token with the following configurations (Can be fine-grained or classic):
    • repo:All
    • read:packages
    • read:org
    • read:user
    • read:enterprise
    • read:project

?> This is a classic token configuration, fine grained might be easier

Getting Started

  1. Clone this repo and navigate to the root directory
  2. Create a .collector.env file in the root of the repo, make a copy of the example
  3. Make sure organization is set to liatrio, unless you want to specify another
  4. Run docker compose up
  5. Open grafana by navigating to http://localhost:3000
  6. To view the demo dashboard in Grafana go to dashboards > o11y > demo

?> You will need to wait a moment for Grafana to recieve data from Prometheus, the page will look like this until enough data has been recieved:

No Data

Once the data has been recieved it should look like this:

Lots of Data

Service Link
Grafana http://localhost:3000
Prometheus http://localhost:9090
Collector gRPC Receiver http://localhost:4317
Collector HTTP Receiver http://localhost:4318
Collector Prometheus Metrics http://localhost:9464/metrics
Collector Health Check http://localhost:8888/metrics
Webhook Event Receiver http://localhost:8088/events

Bonus Section: Development Workflow

When you want to test pre-release functionality from the liatrio collector or test newly developed functionality you can use the following workflow to test your changes locally in a full stack environment.

  1. Clone the liatrio-otel-collector repo
  2. Run make dockerbuild in the root of the liatrio-otel-coollector repo
  3. Follow Getting Started section but uncomment OTEL_COLLECTOR_IMAGE variable in the .env file
  4. Run docker compose up in the root of this repo
  5. For a more detailed developer workflow read the docs in the liatrio-otel-collector repo

Bonus Section: DORA Metrics with GitHub Event Logs

  1. Create a GitHub App with permissions for Issues Deployments and Pull requests then have it subscribed to Issues Deployment Status and Pull request events while leaving webhooks disabled for now. This can be done by navigating to Settings->Developer Settings->GitHub Apps.

GitHub App Permissions Github App Event Subscriptions
  1. Using Ngrok or another tool to forward traffic from your GitHub App to your local machine, set up forwarding to http://localhost:8088/ which is going to be the endpoint for our webhook receiver should it be running locally.
  2. This is going to give you a web address which we will be using as our webhook url in our GitHub App. Be sure to add /events to that address as that is the path that the webhook event receiver will be expecting these event logs at by default.
  3. Uncomment the relevant code in the collector-config.yaml and docker-compose file in order to setup the tools required for this.
  4. Now you should be all set to start ingesting GitHub event logs.

Requirements

  1. Use GitHub Issues to track outages/interuptions in service caused by a recent deployment
  2. There should only be one issue open at a time with a label called incident for all problems caused by the latest deployment
  3. For GitHub to keep track of your deployments, you must be using GitHub Environments inside the workflow that runs the deployment
  4. The workflow also has to be using tooling that deploys your code from GitHub to an external platform or GitHub itself such as Terraform

TODO

  • consider publishing collector image with arch amd64 as latest tag
    • Decision was made to just use amd64 arch in the compose file for now
  • add health check endpoint back in and update documentation
  • dashboard template variables
  • expand dashboard to include some more advanced expression to show off what you can do with granular data

o11y-demo's People

Contributors

adrielp avatar caseyw avatar densellp avatar gesparza3 avatar jburns24 avatar pmpaulino avatar rhoofard avatar

Watchers

 avatar  avatar

o11y-demo's Issues

Prototype Deployment Frequency data collection

Create a means to collect the deployment events, and store them.

Required data:

  • Deployments with timestamps

Tasks

Create data pulling mechanism for DORA metrics

We will need to investigate how we are going to pull and store the data for the metrics.

We may be able to utilize some existing means, such as webhooks, or we may want to pull this data with our own GitHub app.

This should result in a means to pull the data, and store it for use in the dashboard.

Create DORA Dashboard

Create a dashboard to display DORA metrics in Grafana

Tasks

Prototype Change Failure Rate data collection

Collect data for Change Failure Rate, and store it

Required data:

  • Deployments
  • incident issues with timestamps

Tasks

Prototype Time to Restore Service data collection

Collect the data for the Time to Restore Service metric.

Required data:

  • Issues tagged as incident with opening and closing timestamps
  • Deployment with timestamp that closes the issue

Tasks

Prototype Lead Time for Changes data collection

Pull the data for Lead Time for Changes metric, and store it.

Required data:

  • First commit for deployment time
  • Deployment time

Tasks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.