Coder Social home page Coder Social logo

metric-store-release's Introduction

Metric Store: A Cloud-Native Time Series Database

slack.cloudfoundry.org Build Status

Metric Store Release is a BOSH release for Metric Store. It provides a persistent storage layer for metrics sent through the Loggregator subsystem. It is multi-tenant aware (the auth proxy ensures that you only have access to metrics from your apps), easy to query (it is 100% compatible with the Prometheus Query API, with some exceptions listed below), and has a powerful storage engine (the InfluxDB storage engine has built-in compression and a memory-efficient series index).

Deploying

Metric Store can be deployed within Cloud Foundry. Metric Store will have to know about Loggregator.

Cloud Config

Every BOSH deployment requires a cloud config. The Metric Store deployment manifest assumes the CF-Deployment cloud config has been uploaded.

Creating and Uploading a Release

The first step in deploying Metric Store is to create a release or download it from bosh.io. Final releases are preferable, however during the development process dev releases are useful.

The following commands will create a dev release and upload it to an environment named testing.

bosh create-release --force
bosh -e testing upload-release --rebase

Cloud Foundry

Metric Store deployed within Cloud Foundry reads from the Loggregator system and registers with the GoRouter at metric-store.<system-domain>.

You can deploy Metric Store by using this operations file.

bosh -e testing -d cf \
    deploy cf-deployment.yml \
    -o add-metric-store-to-cfd.yml

Metric Store UAA Client

By Default, Metric Store uses the doppler client included with cf-deployment.

If you would like to use a custom client, it requires the uaa.resource authority:

<custom_client_id>:
    authorities: uaa.resource
    override: true
    authorized-grant-types: client_credentials
    secret: <custom_client_secret>

Using Metric Store

Storing Metrics

Metric Store ingresses all metrics (discarding logs) from the Reverse Log Proxy on Loggregator. Any metric sent to a Loggregator Agent will travel downstream into Metric Store.

Accessing Metrics in Metric Store

Authorization and Authentication

Metric Store as deployed in a Cloud Foundry deployment depends on the CF Auth Proxy job to convert your UAA provided auth token into an authorized list of source IDs for Metric Store. In Cloud Foundry terms, the source ID can either represent an application guid (e.g. cf app <app-name> --guid), or a component name (e.g. doppler).

Each request must have the Authorization header set with a UAA provided token. If the token contains the doppler.firehose scope, the request will be able to read data from any source ID. If the source ID is an app guid, the Cloud Controller is consulted to verify if the provided token has the appropriate app access.

PromQL via HTTP

Metric Store provides Prometheus Query Language (PromQL) compatible endpoints. Queries against Metric Store can be crafted with the help of the Prometheus API Documentation.

Example: GET /api/v1/query

This issues a PromQL query against Metric Store data.

curl -G "http://<metric-store-addr>:8080/api/v1/query" --data-urlencode 'query=metrics{source_id="source-id-1"}'
Response Body
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [{ "metric": {...}, "point": [...] }]
  }
}

See the official PromQL API documentation for more information.

Notes on PromQL

A valid PromQL metric name consists of the characters [a-Z][0-9], underscore, and colon. Names can begin with [a-Z], underscore, or colon. Names cannot begin with a number [0-9]. As a measure to work with existing metrics that do not comply with the above format a conversion process takes place when matching on metric names. As noted above, any character that is not in the set of valid characters is converted to an underscore before it is written to disk. For example, to match on a metric name http.latency use the name http_latency in your query.

Prometheus API Compatability
  • /api/v1/query & /api/v1/query_range, fully supported except for regex matchers on __name__ (for everyone) or source_id (for non-admins)
  • /api/v1/series, /api/v1/labels, /api/v1/rules, /api/v1/alerts & /api/v1/alertmanagers, fully supported for admins
  • the remaining endpoints are not currently supported

Golang Clients For BOSH-Deployed Components

Interacting with Metric Store directly, circumventing the GoRouter and CF Auth Proxy, can be done using our Go ingress client library or our Go egress client library. This will require a bosh deployed component to receive metric-store bosh links for certificate sharing. The resulting client interaction has admin access.

Using Grafana to visualize metrics

See Set up Metric Store with Grafana in the docs directory.

Contributing

We'd love to hear feedback about your experiences with Metric Store. Please feel free to open up an issue, send us a pull request, or come chat with us on Cloud Foundry Slack.

metric-store-release's People

Contributors

aegershman avatar attack avatar bradylove avatar christopherclark avatar dependabot[bot] avatar geofffranks avatar gggevorgyan avatar hdub2 avatar hmanukyanvmw avatar jbooherl avatar johannaratliff avatar jpmcb avatar jtuchscherer avatar metric-store-ci avatar robertjsullivan avatar thepeterstone avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metric-store-release's Issues

Metric store (and log-cache) purge specific metric after 5 minutes

Is this an Issue/Bug or a Feature Request?

[x] Issue or Bug
[ ] Feature Request

What is this issue about? Or why do you want this feature?

BBSMasterElected is a metric the bbs component of Diego emits (on event) when a different instance takes over as the active one. We are seeing this metric get purged from metric-store and log-cache what seems to be exactly 5 minutes after it is emitted.

What version of Metric Store are you using?

v1.2.2

Please provide output that helps describe the issue:

Targeting a CF with metric-store enabled:

In one terminal session, run:

while true; do date && LOG_CACHE_ADDR="https://metric-store.<system-domain>" cf query 'BBSMasterElected{source_id="bbs"}'; sleep 5; done

In another terminal session, run:

while true; do date && LOG_CACHE_ADDR="https://log-cache.<system-domain>" cf query 'BBSMasterElected{source_id="bbs"}'; sleep 5; done

Find the VM with the active bbs instance (usually diego-api/0):

bosh -d cf ssh diego-api/0 -c 'sudo -i -u root sh -c "cfdot locks | grep bbs | jq -r .owner"'

Restart the active bbs instance to cause metric to be emitted:

bosh -n -d cf restart diego-api/<instance-guid>

Monitor the output of the two cf query loops, 5 minutes after the metric first shows up, it stops being emitted.

Is there anything unique or special about your setup?

This environment is a cf-deployment with the provided metric-store opsfile and other various opsfiles specific to our CI. We can provide the additional opsfiles if they are deemed relevant.

THE BEST REPO

Is this an Issue/Bug or a Feature Request?

[x] Issue or Bug
[ ] Feature Request

What is this issue about? Or why do you want this feature?

It's too good.

ap1/v1/rules endpoints seems not consistent across instances

Is this an Issue/Bug or a Feature Request?

[x] Issue or Bug
[ ] Feature Request

What is this issue about? Or why do you want this feature?

Rules definitions should be consistent no matter which node your request hits, but it seems that one node says the groups are empty, and another says the rules are uploaded correctly.

What version of Metric Store are you using?

1.4.2 tile for TAS

Please provide output that helps describe the issue:
grafana/8cfa72fa-8f66-4ffc-a319-fed1c6298cc5:/var/vcap/jobs/grafana/bin# curl -k https://10.0.4.46:8080/api/v1/rules --cert cert --key key | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    41  100    41    0     0    401      0 --:--:-- --:--:-- --:--:--   401
{
  "status": "success",
  "data": {
    "groups": []
  }
}
grafana/8cfa72fa-8f66-4ffc-a319-fed1c6298cc5:/var/vcap/jobs/grafana/bin# curl -k https://10.0.4.47:8080/api/v1/rules --cert cert --key key | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7625    0  7625    0     0  60968      0 --:--:-- --:--:-- --:--:-- 61491
{
  "status": "success",
  "data": {
    "groups": [
      {
        "name": "heathwatchCalculatedMetrics",
        "file": "/var/vcap/store/metric-store/rule_managers/heathwatchCalculatedMetrics/rules.yml",
        "rules": [
          {
            "name": "Diego_AvailableFreeChunksDisk",
            "query": "sum(floor(CapacityRemainingDisk / 6144))",
            "labels": {
              "chunkSize": "6144",
...
Is there anything unique or special about your setup?

cc @fredwangwang

**Docs request:** Cannot run acceptance tests

Is this an Issue/Bug or a Feature Request?

[ ] Issue or Bug
[x] Feature Request

What is this issue about? Or why do you want this feature?

As an open source contributor with no knowledge of the team's internal workings, I would like to be able to run acceptance tests before filing a PR. Currently, scripts/test excludes acceptance.

Currently our TestConfig assumes a deployed environment to override the env vars, but we don't communicate that.

I would appreciate more documentation in the README about how to run the tests so I can be assured I wired everything up correctly.

2019/05/09 15:02:54 failed to load metric store acceptance test config: 
missing required environment variables: CA_PATH, CERT_PATH, CLIENT_ID, CLIENT_SECRET, KEY_PATH, METRIC_STORE_ADDR, METRIC_STORE_CF_AUTH_PROXY_URL, UAA_URL
Please provide output that helps describe the issue:

Allow metric-store.service.internal alias to bypass the auth proxy

Allow bosh components to talk directly to the Metric Store with mTLS without an OAuth token.

The assumption is that service to service communication will have admin privileges. If a service needs to have permissions based on the user the account, the service should use the OAuth flow.

Docs for Sizing VMs

It's hard to configure the Metric Store if we don't have any recommendations for how big or small the VMs should be configured. Adding to the difficulty, it's tough to give a recommendation without knowing expected metric throughput.

Recommendations like the Influx Hardware sizing guide would be useful. We could also include a reference Metric Store deployment on a largish foundation like on PWS (maybe Diego Cell / AI count for ballpark metric throughput).

Configuring GitBot is recommended

Pivotal provides the GitBot service to synchronize pull requests and/or issues made against public GitHub repos with Pivotal Tracker projects. This service does not track individual commits.

If you are a Pivotal employee, you can configure Gitbot to sync your GitHub repo to your Pivotal Tracker project with a pull request. An ask+cf@ ticket is the fastest way to get write access if you get a 404 to the config repo.

If you do not want have pull requests and/or issues copied from GitHub to Pivotal Tracker, you do not need to take any action.

If there are any questions, please reach out to [email protected].

MSATS not in sample manifest

Is this an Issue/Bug or a Feature Request?

[ ] Issue or Bug
[x] Feature Request

What is this issue about? Or why do you want this feature?

I would like for the acceptance test errand to be deployed when I use the sample deployment manifest

What version of Metric Store are you using?

This commit: 334834a

Please provide output that helps describe the issue:

bosh -d cf errands does not show the MSATS errand

Is there anything unique or special about your setup?

Issues not showing up in tracker

Is this an Issue/Bug or a Feature Request?

[ ] Issue or Bug
[ ] Feature Request

What is this issue about? Or why do you want this feature?

In your own words, describe the issue.
What steps/actions led to the issue?

What version of Metric Store are you using?
Please provide output that helps describe the issue:

It's helpful to include snippets of the error response or logs output

Is there anything unique or special about your setup?

Docs for Rules Manager API

As of release 1.4.0, we've included a new API endpoints for managing recording rules and alerts via the new Rules Manager API but haven't documented how admins and developers building integrations with the Metric Store can take advantage of it.

  • HTTP Methods and expected payload for each point
  • Potential errors

Tests are skipped on Mac OS

Is this an Issue/Bug or a Feature Request?

[x] Issue or Bug
[ ] Feature Request

What is this issue about? Or why do you want this feature?

There multiple places in the test suite where we make a runtime check and skip the test on Mac OS. This works around some flaky test behavior but lowers the utility of the suite and leads to build breaks due to false negatives.

  1. src/internal/acceptance/metric-store/metric_store_test.go#L521
  2. src/internal/acceptance/metric-store/metric_store_test.go#L538
  3. src/internal/acceptance/metric-store/metric_store_test.go#L560
  4. src/performance/performance_test.go#L30
  5. src/pkg/leanstreams/leanstreams_test.go#L229
  6. src/pkg/leanstreams/leanstreams_test.go#L254
  7. src/pkg/ingressclient/ingressclient_test.go#L61
What version of Metric Store are you using?

develop

Please provide output that helps describe the issue:

For example, the ingressclient:

IngressClient
  errors if writing fails
  /Users/pstone/workspace/metric-store-release/src/pkg/ingressclient/ingressclient_test.go:59

S [SKIPPING] [0.004 seconds]
IngressClient
/Users/pstone/workspace/metric-store-release/src/pkg/ingressclient/ingressclient_test.go:18
  errors if writing fails [It]
  /Users/pstone/workspace/metric-store-release/src/pkg/ingressclient/ingressclient_test.go:59

  doesn't work on Mac OS

  /Users/pstone/workspace/metric-store-release/src/pkg/ingressclient/ingressclient_test.go:61
Is there anything unique or special about your setup?

Running scripts/test on a Mac.

Unable to scrape prometheus rsocket proxy /metrics/connected endpoint when multiple apps send to proxy server

Is this an Issue/Bug or a Feature Request?

[x] Issue or Bug
[ ] Feature Request

What is this issue about? Or why do you want this feature?

We use the prometheus rsocket proxy to collect all of our app metrics, across many microservices, and expose it to prometheus. We would like to be able to ingest that endpoint into the metrics store but when more than one app is sending metrics to the prometheus rsocket proxy, the /metrics/connected no longer is properly scraped by the metrics registrar. If the proxy is only receiving a single app's metrics, it works fine. Prometheus itself can ingest the proxy's endpoint just fine however.

This feature is important as it means all of our microservices only have to open an rsocket connection to the proxy server instead of configuring each to be scraped into the metrics store. Then the only endpoint the metrics store needs to scrape if the proxy's /metrics/connected endpoint and need not know about all the other applications.

What version of Metric Store are you using?

Unsure, as this is managed by our platform team but I can update when I know. I assume it is the latest release as we stay current. I will ask our platform team to comment on this issue.

Please provide output that helps describe the issue:

There does not appear to be any significant output with an error indicating why it cannot be ingested.

Is there anything unique or special about your setup?

Not that I am aware of. I will ask our platform team to comment on this issue.

Document how to use metric-store with Grafana

Is this an Issue/Bug or a Feature Request?

[ ] Issue or Bug
[x] Feature Request

What is this issue about? Or why do you want this feature?

I had to ask in CF Slack and do some Github code search in order to configure metric-store to work with Grafana

What version of Metric Store are you using?

1.1.2


I'm raising this issue to assign to myself, where would be a good place to document this, that isn't this issue? I'll then raise a PR.


⬇️ Operation to create an UAA OAuth client for Grafana

- type: replace
  path: /variables/-
  value:
    name: grafana_client_secret
    type: password

- type: replace
  path: /instance_groups/name=uaa/jobs/name=uaa/properties/uaa/clients/grafana?
  value:
    authorities: ''
    scope: openid,uaa.resource,doppler.firehose,logs.admin,cloud_controller.read
    authorized-grant-types: authorization_code,refresh_token
    override: true
    secret: ((grafana_client_secret))
    redirect-uri: https://grafana.((system_domain))/login/generic_oauth

⬇️ Operation to create Grafana with the correct configuration

- type: replace
  path: /instance_groups/name=metric-store/jobs/-
  value:
    name: grafana
    release: grafana
    properties:
      grafana:
        root_url: https://grafana.((system_domain))
        users:
          auto_assign_organization_role: Admin
        auth:
          generic_oauth:
            name: GOV.UK PaaS
            enabled: true
            allow_sign_up: true
            client_id: grafana
            client_secret: ((grafana_client_secret))

            scopes:
              - openid
              - uaa.resource
              - doppler.firehose
              - logs.admin
              - cloud_controller.read

            auth_url: https://login.((system_domain))/oauth/authorize
            token_url: https://login.((system_domain))/oauth/token
            api_url: https://login.((system_domain))/userinfo
        datasources:
          - name: Prometheus
            url: https://metric-store.((system_domain))
            type: prometheus
            editable: false
            orgId: 1
            jsonData: '{"httpMethod":"GET","keepCookies":[],"oauthPassThru":true}'

Support for /api/v1/labels/__name__/values for non-admin users

Is this an Issue/Bug or a Feature Request?

[ ] Issue or Bug
[x] Feature Request

What is this issue about? Or why do you want this feature?

In order to get Grafana auto-complete to show a user what metrics are available to them, the Prometheus API endpoint /api/v1/labels/__name__/values endpoint needs to be supported.

The current implementation only returns values if the user is an admin (has the scopes doppler.firehose or logs.admin).

As a user using Grafana to look at my Cloud Foundry metrics, I expect to be able to list the metrics available to me, so I can use the autocomplete feature in Grafana.

What version of Metric Store are you using?

1.1.2

Please provide output that helps describe the issue:
curl -H "Authorization: $(cf oauth-token)" https://metrics-store-url/api/v1/labels/__name__/values

Will return 404 when the user does not have scope doppler.firehose or logs.admin

Is there anything unique or special about your setup?

No

Add VictoriaMetrics as storage backend and PromQL executor

Is this an Issue/Bug or a Feature Request?

[ ] Issue or Bug
[x] Feature Request

Why do you want this feature?

metric-store uses InfluxDB as storage backend. It would be great to add ability to use VictoriaMetrics as storage backend, since it requires less CPU, RAM and storage space comparing to InfluxDB. See the following articles for details:

Additionally, VictoriaMetrics provides high-performance PromQL engine written from scratch. The engine provides extended functionality on top of standard PromQL. It would be great if PromQL engine from VictoriaMetrics could be added into metric-store as an alternative to existing PromQL engine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.