Coder Social home page Coder Social logo

ops-agent's Introduction

Ops Agents

This repository contains the Ops Agents (Linux and Windows) that are part of the Google Cloud Operations product suite (specifically Cloud Logging and Cloud Monitoring).

The agents gather logs and metrics from your Google Compute Engine instances and send them to Cloud Logging and Cloud Monitoring.

See documentation at:

License and Copyright

Copyright 2020 Google LLC. Apache License, Version 2.0

ops-agent's People

Contributors

alison0211 avatar avilevy18 avatar binaryfissiongames avatar braydonk avatar davidbtucker avatar dehaansa avatar dependabot[bot] avatar djaglowski avatar ekund avatar evansimpson avatar franciscovalentecastro avatar hopkiw avatar hsmatulisgoogle avatar igorpeshansky avatar jefferbrecht avatar jkschulz avatar jonathanwamsley avatar lujieduan avatar martijnvans avatar qingling128 avatar quentinmit avatar rafaelwestphal avatar ridwanmsharif avatar schmikei avatar shafinsiddique avatar sophieyfang avatar stackdriver-instrumentation-release avatar stefankurek avatar stevenycchou avatar subbarker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ops-agent's Issues

Memcached receiver collects metrics which are different from the document on cloud.google.com

For memcached, the doc in this repo list the following metrics to be collected

  • workload.googleapis.com/memcached.bytes
  • workload.googleapis.com/memcached.commands
  • workload.googleapis.com/memcached.connections.current
  • workload.googleapis.com/memcached.connections.total
  • workload.googleapis.com/memcached.cpu.usage
  • workload.googleapis.com/memcached.current_items
  • workload.googleapis.com/memcached.evictions
  • workload.googleapis.com/memcached.network
  • workload.googleapis.com/memcached.operations
  • workload.googleapis.com/memcached.threads

All of the above metrics have been verified in Cloud Monitoring's Metrics Explorer after installation of the receiver.

However, the Cloud docs lists 3 different metrics, workload.googleapis.com/memcached.current_connections, workload.googleapis.com/memcached.rusage & workload.googleapis.com/memcached.total_connections , and 1 additional metric workload.googleapis.com/memcached.operation_hit_ratio

Please check which doc is the correct source. (Based on the result in Metrics Explorer, it seems what this repo shows are correct)

image

Support the deletion of logs to protect sensitive values.

The existing, legacy, implementation of the logging agent supported log filtering for the deletion of log entries that matched a pattern.

A feature following this issue would allow for a similar method of log processing in the new Ops Agent.

Types parameter not available in parse_regex

Hi,

I'm trying to migrate from the legacy google agent to the new ops-agent a regex parser.

In the legacy agent, it was possible to define the types parameter to change the type of the values captured, I can see that this is also available in fluentbit but I couldn't find this parameter available on the ops-agent configuration.

Is this types parameter not available on the ops-agent for the type regex_parser? If that's the case, any way of workaround to have this configured?

Thanks

Exclude metrics feature not working as intended

I'm trying to exclude logging and some metrics to save money, by limiting the numbers of metrics ingested. See below my config:

logging:
  service:
    pipelines:
      default_pipeline:
        receivers: []
metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 60s
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern: 
        - agent.googleapis.com/agent/*
        - agent.googleapis.com/iis/*
        - agent.googleapis.com/interface/*
        - agent.googleapis.com/mssql/*
        - agent.googleapis.com/pagefile/*
        - agent.googleapis.com/swap/*
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics]
        processors: [metrics_filter]

Like this, nothing is excluded correctly, I can see everything in the metrics explorer.
When I keep only " - agent.googleapis.com/agent/* ", it's working, I don't see agent metrics anymore, but if I replace with " - agent.googleapis.com/swap/* " I can still see swap metrics... I am restarting successfully the service every time.

Are there metrics not deactivatable? Or am I not writing the metrics correctly?

Windows Server 2019 Datacenter - using google-cloud-ops-agent.x86_64.2.3.1@1

Support config.d style directories for config merging

My team is planning an Ops Agent rollout, where we will make use of the default system metrics and logging, including some custom metrics and custom logging rules, but also anticipate that downstream teams will their own custom metrics and logging rules.

This would require merging:

  • Default configuration
  • Platform team’s base configuration
    • Custom metrics (organisation–wide)
    • Custom logging (organisation–wide)
  • Service-specific configuration
    • Custom metrics (per–service)
    • Custom logging (per–service)

Since different teams would be responsible for different parts of the configuration, the preferred way would be to have separate configuration files which are merged at runtime.

Fluent Bit (which Ops Agent uses) supports an include syntax for configuration which suits this use case well.

But I don't see an equivalent for Ops Agent, which expects a single configuration file for all rules contained within it.

Ideally, Ops Agent would support a config.d style directory, where each file is merged structurally. I expect this to be functionally similar to how the default config is merged in confgenerator/confmerger.go, except support an arbitrary number of inputs.

Until that's implemented, we need to look at workarounds:

  • We use Puppet which has a concat module available, and multiple teams could have their configs concatenated, but YAML cannot safely be merged in a text–based fashion, due to differing whitespace.
  • We could implement our own YAML–native merging prior to deployment (Hiera has this natively, or we could write our own)

Include default log rotation rules for subagent logs

Subagent logs can grow quickly and fill up the disk. Currently ops agent doesn't have any logrotate rules for these log files and they continue to accumulate indefinitely.

For example, in one of the VMs I'm running, the logging-module.log file is 2.4 GB after 3 weeks. All installations will eventually run out of disk.

Users could create a rule under /etc/logrotate.d but a reasonable default should be provided out of the box.

Cleanup jmx-gatherer.jar build and inclusion

Right now, the jmx-gatherer jar is pulled in as a git submodule and built as part of ops-agent. This has a few issues:

  1. The nebula plugin is unable to read git versioning information appropriately in a submodule. Instead we hijack its version-detection schema. Longer term this could lead to runtime version differences (if upstream begins to use nebula-plugin for more than just git tagging).
  2. We are re-building the same JAR repeatedly on different environments. This puts a dependency on the JVM across all of our build environments, rather than just integration tests.

An ideal solution would do one of the following:

  • Build the jmx-gatherer jar and publish "our" version and consume from that source. This would allow us to carry our own patches if necessary. The Ops-Agent built would simply point at a "released" jar to build.
  • Update the submodule to work as expected with the nebula plugin. Version should be inferred from latest commit-tag on the submodule, and all version strings we use to pull the JAR would come from that inferred tag.

mysql error

The following failed, 4 times in a row:

+ sudo apt-key adv --keyserver [pgp.mit.edu](https://www.google.com/url?q=http://pgp.mit.edu&sa=D) --recv-keys 3A79BD29
Warning: apt-key output should not be parsed (stdout is not a terminal)
gpg: keyserver receive failed: No keyserver available

https://storage.cloud.google.com/ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/third_party_apps/presubmit_github/buster/232/20220314-092454/logs/%2Bbuild_and_test.txt

Not sure if it is a transient issue or something we need to fix, or both.

Custom metrics with gRPC receiver

Hello team,

Is it possible to push custom metrics via the otel collector deployed by ops agent.
It seems the otlp grpc receiver is not whitelisted …
Would be a very nice to have to be able to publish metrics with open telemetry python sdk.

Please lemme know :)
Thanks
Fred

Some metrics cannot be excluded

Hello, we are using this config

logging:
  receivers:
    syslog:
      type: files
      include_paths:
      - /var/log/messages
      - /var/log/syslog
  service:
    pipelines:
      default_pipeline:
        receivers: [syslog]
metrics:
  receivers:
    hostmetrics:
      type: hostmetrics
      collection_interval: 60s
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern:
      - agent.googleapis.com/cpu/*
      - agent.googleapis.com/interface/*
      - agent.googleapis.com/network/*
      - agent.googleapis.com/memory/bytes_used/*
      - agent.googleapis.com/swap/*
      - agent.googleapis.com/disk/io_time/*
      - agent.googleapis.com/disk/merged_operations/*
      - agent.googleapis.com/pagefile/*
      - agent.googleapis.com/disk/weighted_io_time/*
      - agent.googleapis.com/disk/write_bytes_count/*
      - agent.googleapis.com/disk/read_bytes_count/*
      - agent.googleapis.com/processes/*
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics]
        processors: [metrics_filter]

The agent.googleapis.com/processes/* exclusion and other ones with 2 path components (not including the /*) work but exclusions with 3 path components do not work (like agent.googleapis.com/memory/bytes_used/*)

Being excluded correctly:

  • agent.googleapis.com/cpu/*
  • agent.googleapis.com/interface/*
  • agent.googleapis.com/network/*
  • agent.googleapis.com/swap/*
  • agent.googleapis.com/pagefile/*
  • agent.googleapis.com/processes/*

Not being excluded:

  • agent.googleapis.com/memory/bytes_used/*
  • agent.googleapis.com/disk/io_time/*
  • agent.googleapis.com/disk/merged_operations/*
  • agent.googleapis.com/disk/weighted_io_time/*
  • agent.googleapis.com/disk/write_bytes_count/*
  • agent.googleapis.com/disk/read_bytes_count/*

I'm assuming the problem here is that metrics like agent.googleapis.com/disk/read_bytes_count/* do not branch out anymore so the ending /* is a problem because it expects it to branch out more, removing /* throws an error when starting the google-cloud-ops-agent as it's hardcoded to require it here
Metrics like agent.googleapis.com/processes/* are getting excluded because they branch out into agent.googleapis.com/processes/count_by_state so the /* matches

Logging splits up multiline log messages

Log entries spanning multiple lines like stack traces are split up and end up as multiple oneline entries in the Logs Explorer. Fluentbit provides the “Multiline” filter to deal with that, so I propose to implement it and add a filters element to the configuration model.

This is not a duplicate of #239: This issue requests support of GCP Cloud Tracing whereas I’m asking for proper treatment of stack traces for GCP Logging.

rabbitmq fail on debian11

+ curl -s https://packages.erlang-solutions.com/ubuntu/erlang_solutions.asc
+ sudo apt-key add -
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
+ source /etc/os-release
++ PRETTY_NAME='Debian GNU/Linux 11 (bullseye)'
++ NAME='Debian GNU/Linux'
++ VERSION_ID=11
++ VERSION='11 (bullseye)'
++ VERSION_CODENAME=bullseye
++ ID=debian
++ HOME_URL=https://www.debian.org/
++ SUPPORT_URL=https://www.debian.org/support
++ BUG_REPORT_URL=https://bugs.debian.org/
+ case $VERSION_ID in
+ echo -n 'unknown version'
+ exit 1

Ability to run integration tests quickly

Currently, kokoro has a monopoly on running integration tests. This presents two major challenges to developers who are adding new integrations to the ops agent.

  1. There is a lack of visibility into exactly how kokoro sets up and runs the tests. Much can be understood from documentation, the structure of integration test files, and from execution logs. However, there is much that is unclear about the process that would be helpful if made visible. Ideally, kokoro would execute a fully self-contained routine that is defined entirely within this repository. This would provide far more insight into issues that arise from incorrect assumptions about the test environment.
  2. The development loop is very slow. kokoro takes about 15 minutes to run. This causes significant friction and noise, even when working through fairly trivial code changes. Ideally, it would be possible for developers to run a fully self-contained routine locally and in a much shorter period of time. This would allow for faster development cycles and far less churn on pull requests. Aside from providing a mechanism to run locally, much time would be saved if unnecessary recompilations could be skipped, and if developers could target specific integrations and environments (e.g. run nginx integration on debian.

Ability to include multiple configuration files

As a part of our deployments we have different components on each of our GCE instances. Instead of making changes to the configuration file /etc/google-cloud-ops-agent/config.yaml , I would like the ability to drop a configuration file for the receiver associated with each component into a directory, for example /etc/google-cloud-ops-agent/config.d/${COMPONENT_RECEIVER_YAML}

The existing structure of how configuration is built already combines the built in configuration with the additional configuration as stored in /etc/google-cloud-ops-agent/config.yaml, I expect that the ability to merge 3rd and further additional configurations should not be a stretch.

I'll see if I can rustle up a merge request for it, but given my lack of exposure to golang, that'll definitely take a long while.

JSON support

From my understanding of the documentation, there should be a support for JSON logs.

Logs from docker containers look like this:

{"log":"some log","stream":"stdout","time":"2021-09-20T14:06:10.422145949Z"}

My configuration file is the following:

---
logging:
  receivers:
    docker:
      type: "files"
      include_paths:
        - "/var/lib/docker/containers/*/*-json.log"
  processors:
    docker:
      type: "parse_json"
      field: "log"
      time_key: "time"
      time_format: "%Y-%m-%dT%H:%M:%S.%L%Z"
  service:
    pipelines:
      docker:
        receivers: [docker]
        processors: [docker]
metrics:
  service:
    pipelines:
      default_pipeline:
        receivers: []

I assumed this would produce, either a structured log entry in jsonPayload, or only extract the field I put in the configuration (log and time), but all I see is my raw json:

Expected:

{
  "jsonPayload": {
    "message": "some log",
   },
  "timestamp": "2021-09-20T14:06:11.150181996Z"
}

OR

{
  "jsonPayload": {
    "log": "some log",
    "time": "2021-09-20T14:06:11.150181996Z",
    "stream":"stdout"
  }
}

Observed:

{
  "jsonPayload": {
    "message": "{\"log\":\"some log\",\"stream\":\"stdout\",\"time\":\"2021-09-20T14:06:10.422145949Z\"}"
  }
}

Have I missed something?
Is there any way to achieve what I want?

systemd_journald: Transform keys from journald JSON to Cloud Logging JSON

Hello!

I am looking at using the Ops Agent for my GCP project, and I'm especially interested in using the systemd_journald receiver. I am interested in this because journald is already collecting logs for me, and it would be great to take advantage of journald's structured-log format.

Even though the receiver is doing the work of pulling in logs from journald, the log entries coming from journald aren't in a form that Cloud Logging can process. So, my request is that the Ops Agent's systemd_journald receiver be enhanced to transform the JSON entries from systemd's schema to Cloud Logging's schema.

One key from the JSON needs a simple key change:

  • The key MESSAGE needs to change to message.

One key needs both a key change and a value change:

  • SYSLOG_PRIORITY is a numeric priority encoded as a string. The key name needs to change to severity, and the value needs to be mapped to an acceptable-to-Cloud-Logging string using the following mapping:
    7 maps to DEBUG
    6 maps to INFO
    5 maps to NOTICE
    4 maps to WARNING
    3 maps to ERROR
    2 maps to CRITICAL
    1 maps to ALERT
    0 maps to EMERGENCY

There are three keys which could be present and, if so, need to trigger the creation of the logging.googleapis.com/sourceLocation object:

  • CODE_FILE: If present, it should be added to the logging.googleapis.com/sourceLocation object, under key file.
  • CODE_LINE: If present, it should be added to the logging.googleapis.com/sourceLocation object, under key line.
  • CODE_FUNC: If present, it should be added to the logging.googleapis.com/sourceLocation object, under key function.

Finally, the timestamp object has to be created by applying some math to the __REALTIME_TIMESTAMP key:

For the timestampSeconds key: Take the value from ⌊__REALTIME_TIMESTAMP ÷ 1000000⌋.
For the timestampNanos key: Take the value from __REALTIME_TIMESTAMP mod 1000000 × 1000.

GCP Ops Agent | Jvm Monitoring | Multiple process | Java | Single VM |

I have GCP Ops Agent for JVM Monitoring in one of the VMs.

https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/third-party/jvm

This works fine with one jvm installed, I am able to see the data in gcp monitoring ui the data which is collected is shown in the UI.

The following configuration is used in /etc/google-cloud-ops-agent/config.yaml

metrics:
  receivers:
    jvm_metrics:
      type: jvm
      endpoint: localhost:9999
      collection_interval: 60s
  service:
    pipelines:
      jvm_pipeline:
        receivers:
          - jvm_metrics

I have a use case, there are 2 JVM processes running in a single machine, both are exposing the data in different jmx ports. Now how do I create a config in the .yml, such that I see data for both and I should be able to differentiate between these JVM metrics.

tried with different variations in the .yml file either get an invalid configuration error or the other jvm data doesn't show up in the gcp monitoring UI.

Any leads are highly appreciated.

Feature request: statsd and prometheus receivers

The statsd and prometheus receivers are available in the opentelemetry-collector-contrib repository. We're currently using the legacy agent for statsd integration on GCP and were currently looking for a way to get some other metrics exposed in the prometheus format to stackdriver as well.

Support for Google Container-Optimized OS

It would be nice if Google Ops Agent would support Google Container-Optimized OS (https://cloud.google.com/container-optimized-os/docs).

Setting up a Google Container-Optimized OS Compute Engine instance with the legacy logging agent is extremely easy and just requires passing google-logging-enabled=true as metadata to the instance (https://cloud.google.com/container-optimized-os/docs/how-to/logging#creating_a_new_instance_with_the_logging_agent_enabled). I am unable to find instructions on how to enable the new Ops Agent.

nginx third_party_apps_test fails when run against a fresh project

The error looks like:

2022/01/26 16:31:27 Attempt 1 of debian-10 test of nginx finished with err=WaitForMetric(metric="workload.googleapis.com/nginx.requests", extraFilters=[]): rpc error: code = NotFound desc = Can not find metric resource_container_ids: [REDACTED] request_context: REQUEST_CONTEXT_READ user_name: [REDACTED] filter: "type = "workload.googleapis.com/nginx.requests"", retryable=false

I think i know what is causing this and i have two fixes in mind:

  • a sleep in the post install script for nginx
  • improved retrying for this kind of error

Elasticsearch logs receiver segfaults fluentbit (debian-11)

Version: 2.10.0~debian11

When installing and setting up the elasticsearch logging receiver on debian-11, it seems that the configuration causes a segfault in fluentbit.

Output from journalctl for the fluentbit process:

Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: Fluent Bit v1.8.12
Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: * Copyright (C) 2019-2021 The Fluent Bit Authors
Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: * Copyright (C) 2015-2018 Treasure Data
Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: * https://fluentbit.io
Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: [2022/02/14 14:05:17] [engine] caught signal (SIGSEGV)
Feb 14 14:05:17 brandon-testing-elasticsearch fluent-bit[65736]: ERROR: no debug info in ELF executable (-1)ERROR: no debug info in ELF executable (-1)ERROR: no debug info in ELF executable (-1)ERROR: no debu>

I've found that this is related to the multiline config for Elasticsearch. It seems like it works just fine with the previous commit of fluentbit we were pointing to, but with fluentbit v1.8.12 this segfault occurs.

This probably effects other receivers that have similar multiline config (e.g. zookeeper), but I haven't tested these.

Golden error tests break with addition of new integrations

Several golden error tests include a hardcoded list of valid receiver types. When integrating a new receiver type, these tests must each be updated accordingly.

Of course this is not difficult to fix, but ideally the integration process would not require interaction with unrelated tests.

Example of running tests after adding a new receiver type, but before updating such a test.

> go test -mod=mod github.com/GoogleCloudPlatform/ops-agent/confgenerator
--- FAIL: TestGenerateConfigsWithInvalidInput (0.00s)
    --- FAIL: TestGenerateConfigsWithInvalidInput/windows (0.00s)
        --- FAIL: TestGenerateConfigsWithInvalidInput/windows/metrics-receiver_unsupported_type (0.00s)
            confgenerator_test.go:255: test "metrics-receiver_unsupported_type": golden file at testdata/invalid/windows/metrics-receiver_unsupported_type/golden_error mismatch (-want +got):
                  strings.Join({
                        "the agent config file is not valid. detailed error: metrics rece",
                        `iver with type "unsupported_type" is not supported. Supported me`,
                        "trics receiver types: [hostmetrics, iis, mssql",
                +       ", nginx",
                        "].",
                  }, "")
FAIL
FAIL    github.com/GoogleCloudPlatform/ops-agent/confgenerator  0.312s
FAIL

Add support to customize port number for agent metrics

Any chance we will be able to change the port or change it to something that is not conflicting?
https://github.com/GoogleCloudPlatform/ops-agent/blob/master/confgenerator/agentmetrics.go#L36

As one of the Druid components uses port 8888 and its conflicting with ops-agent

Mar 10 13:08:21 otelopscol: 2022-03-10T13:08:21.717Z#011error#011service/collector.go:153#011Asynchronous error received, terminating process#011{"error": "listen tcp :8888: bind: address already in use"}

Agent does not start when exclude_logs processor is used

We would like to filter out all syslog logs that are not relevant for our service.

We followed this guide and below is our configuration:
https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/configuration#logging-processors

Process starts with this configuration

logging:
  receivers:
    syslog:
      type: files
      include_paths:
      - /var/log/syslog
  processors:
    parser_syslog:
      type: exclude_logs
      match_any: ['jsonPayload.message =~ "otelopscol"']
  service:
    pipelines:
      default_pipeline:
        receivers: []
      pipeline-syslog:
        receivers: [syslog]
        #processors: [parser_syslog]

However, whenever we uncomment this line "processors: [parser_syslog]", the process fails to start.

logging:
  receivers:
    syslog:
      type: files
      include_paths:
      - /var/log/syslog
  processors:
    parser_syslog:
      type: exclude_logs
      match_any: ['jsonPayload.message =~ "otelopscol"']
  service:
    pipelines:
      default_pipeline:
        receivers: []
      pipeline-syslog:
        receivers: [syslog]
        processors: [parser_syslog]

Not sure what is wrong.

image (10)

Version of Agent:

nginx:~$ dpkg-query --show --showformat \
    '${Package} ${Version} ${Architecture} ${Status}\n' \
     google-cloud-ops-agent
google-cloud-ops-agent 2.10.0~ubuntu20.04 amd64 install ok installed

centos + couchdb 3rd party app test is failing

output is like:

    + sudo yum-config-manager --add-repo https://couchdb.apache.org/repo/couchdb.repo
    + sudo yum install -y couchdb
    Importing GPG key 0x7A00258D:
     Userid     : "The Apache Software Foundation (Package repository signing key) <[email protected]>"
     Fingerprint: 390e f70b b1ea 12b2 7739 6295 0ee6 2fb3 7a00 258d
     From       : https://couchdb.apache.org/repo/keys.asc
    warning: /var/cache/yum/x86_64/7/couchdb/packages/couch-js-1.8.5-21.el7.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 9657a78e: NOKEY
    warning: /var/cache/yum/x86_64/7/couchdb/packages/couchdb-3.2.1-2.el7.x86_64.rpm: Header V4 RSA/SHA1 Signature, key ID 232ef177: NOKEY
    Importing GPG key 0x7A00258D:
     Userid     : "The Apache Software Foundation (Package repository signing key) <[email protected]>"
     Fingerprint: 390e f70b b1ea 12b2 7739 6295 0ee6 2fb3 7a00 258d
     From       : https://couchdb.apache.org/repo/keys.asc
    Importing GPG key 0x9657A78E:
     Userid     : "Nicolae Vatamaniuc <[email protected]>"
     Fingerprint: 0bd7 a984 99c4 ab41 c910 ee65 fc04 dfbc 9657 a78e
     From       : https://couchdb.apache.org/repo/rpm-package-key.asc
    
    
    Public key for couchdb-3.2.1-2.el7.x86_64.rpm is not installed
    
    
     Failing package is: couchdb-3.2.1-2.el7.x86_64
     GPG Keys are configured as: https://couchdb.apache.org/repo/keys.asc, https://couchdb.apache.org/repo/rpm-package-key.asc

Full log: https://storage.cloud.google.com/ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/third_party_apps/presubmit_github/centos7/141/20220301-113150/logs/%2Bbuild_and_test.txt

jvm.memory.pool.used emits values in Double then Integer

Description
jvm.memory.pool.used emits Double then Integer. As the metrics descriptor is automatically created per the first time it receives the timeseries. For this case, the initial points are having value 0.00 which will create the metrics descriptor in valueType of Double see below. However, the points following are integers which get denied by the GCM API with below error msgs

Nov  2 14:36:39 app-admin-01 otelopscol[5791]: 2021-11-02T14:36:39.337Z#011info#011exporterhelper/queued_retry.go:231#011Exporting failed. Will retry the request after interval.#011{"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older end time than the most recent point.: timeSeries[0-31]; Value type for metric workload.googleapis.com/jvm.memory.pool.used must be DOUBLE, but is INT64.: timeSeries[32-37]\nerror details: name = Unknown  desc = total_point_count:38 errors:{status:{code:3} point_count:38}", "interval": "22.291635104s"}

Note: I have saved all points before it sent out, See Additional context for the behavior of initial 0.00 points then integer values.

Metrics Descriptor - jvm.memory.pool.used

{
  "name": "projects/sophieyfang-test/metricDescriptors/workload.googleapis.com/jvm.memory.pool.used",
  "labels": [
    {
      "key": "name"
    }
  ],
  "metricKind": "GAUGE",
  "valueType": "DOUBLE",
  "type": "workload.googleapis.com/jvm.memory.pool.used",
  "monitoredResourceTypes": [
    "anthos_l4lb",
    "apigee.googleapis.com/Environment",
    "apigee.googleapis.com/Proxy",
    "apigee.googleapis.com/ProxyV2",
    "apigee.googleapis.com/TargetV2",
    "aws_alb_load_balancer",
    "aws_cloudfront_distribution",
    ........
}

Steps to reproduce

Inside the GCE VM

sophieyfang_google_com@ubuntu-1804-jvm:~$ sudo apt update
sophieyfang_google_com@ubuntu-1804-jvm:~$ sudo apt install -y default-jdk
sophieyfang_google_com@ubuntu-1804-jvm:~$ cat <<EOF > hello.java
> class HelloWorld {
>   public static void main(String args[]) throws InterruptedException {
>     while (true) {
>       Thread.sleep(1000);
>     }
>   }
> }
> EOF
sophieyfang_google_com@ubuntu-1804-jvm:~$ javac hello.java
sophieyfang_google_com@ubuntu-1804-jvm-clean:~$ java -ea -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=127.0.0.1 HelloWorld > /dev/null 2>&1 &
sophieyfang_google_com@ubuntu-1804-jvm:~$ curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sophieyfang_google_com@ubuntu-1804-jvm:~$ sudo bash add-google-cloud-ops-agent-repo.sh --also-install  --version=2.7.0
sophieyfang_google_com@ubuntu-1804-jvm:~$ sudo systemctl stop google-cloud-ops-agent.service
sophieyfang_google_com@debian10-meow:~$ /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --add-instance-id=false --config=otel.yaml 

Expectation
It shall only generate jvm.memory.pool.used in integer values:
image

What applicable config did you use?
Config: (e.g. the yaml config file)

sophieyfang@sophieyfang:~$ cat /tmp/otel.yaml 
exporters:
  logging:
    loglevel: debug
processors:
  filter/jvm:
    metrics:
      include:
        match_type: strict
        metric_names:
        - workload.googleapis.com/jvm.memory.pool.used
  metricstransform/jvm__pipeline_jvm__metrics_1:
    transforms:
    - action: update
      include: ^(.*)$$
      match_type: regexp
      new_name: workload.googleapis.com/$${1}
  normalizesums/jvm__pipeline_jvm__metrics_0: {}
receivers:
  jmx/jvm__pipeline_jvm__metrics:
    collection_interval: 30s
    endpoint: service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi
    jar_path: /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/opentelemetry-java-contrib-jmx-metrics.jar
    target_system: jvm
service:
  pipelines:
    metrics/jvm__pipeline_jvm__metrics:
      exporters:
      - logging
      processors:
      - normalizesums/jvm__pipeline_jvm__metrics_0
      - metricstransform/jvm__pipeline_jvm__metrics_1
      - filter/jvm
      receivers:
      - jmx/jvm__pipeline_jvm__metrics

Relevant Environment Information
sophieyfang_google_com@debian10-meow:~$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Additional context

sophieyfang_google_com@debian10-meow:~$ /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --add-instance-id=false --config=otel.yaml 
2021-12-09T22:37:55.137Z	info	service/collector.go:174	Applying configuration...
2021-12-09T22:37:55.138Z	info	builder/exporters_builder.go:259	Exporter was built.	{"kind": "exporter", "name": "logging"}
2021-12-09T22:37:55.138Z	info	[email protected]/filter_processor.go:73	Metric filter configured	{"kind": "processor", "name": "filter/jvm", "include match_type": "strict", "include expressions": [], "include metric names": ["workload.googleapis.com/jvm.memory.pool.used"], "include metrics with resource attributes": null, "exclude match_type": "", "exclude expressions": [], "exclude metric names": [], "exclude metrics with resource attributes": null}
2021-12-09T22:37:55.138Z	info	builder/pipelines_builder.go:220	Pipeline was built.	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-09T22:37:55.139Z	info	builder/receivers_builder.go:228	Receiver was built.	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics", "datatype": "metrics"}
2021-12-09T22:37:55.139Z	info	service/service.go:86	Starting extensions...
2021-12-09T22:37:55.139Z	info	service/service.go:91	Starting exporters...
2021-12-09T22:37:55.139Z	info	builder/exporters_builder.go:40	Exporter is starting...	{"kind": "exporter", "name": "logging"}
2021-12-09T22:37:55.140Z	info	builder/exporters_builder.go:48	Exporter started.	{"kind": "exporter", "name": "logging"}
2021-12-09T22:37:55.140Z	info	service/service.go:96	Starting processors...
2021-12-09T22:37:55.140Z	info	builder/pipelines_builder.go:52	Pipeline is starting...	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-09T22:37:55.140Z	info	builder/pipelines_builder.go:63	Pipeline is started.	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-09T22:37:55.141Z	info	service/service.go:101	Starting receivers...
2021-12-09T22:37:55.141Z	info	builder/receivers_builder.go:68	Receiver is starting...	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics"}
2021-12-09T22:37:55.141Z	info	otlpreceiver/otlp.go:68	Starting GRPC server on endpoint 0.0.0.0:41229	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics"}
2021-12-09T22:37:55.142Z	info	builder/receivers_builder.go:73	Receiver started.	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics"}
2021-12-09T22:37:55.143Z	info	service/telemetry.go:92	Setting up own telemetry...
2021-12-09T22:37:55.144Z	info	service/telemetry.go:116	Serving Prometheus metrics	{"address": ":8888", "level": "basic", "service.instance.id": "", "service.version": "latest"}
2021-12-09T22:37:55.144Z	info	service/collector.go:230	Starting google-cloud-metrics-agent...	{"Version": "latest", "NumCPU": 1}
2021-12-09T22:37:55.146Z	info	service/collector.go:132	Everything is ready. Begin running and processing data.
2021-12-09T22:38:00.200Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-09T22:38:00.200Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.version: STRING(1.7.0)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
InstrumentationLibraryMetrics #0
InstrumentationLibrary  
Metric #0
Descriptor:
     -> Name: workload.googleapis.com/jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:37:59.731 +0000 UTC
Value: 0.000000
2021-12-09T22:38:25.922Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-09T22:38:25.922Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.version: STRING(1.7.0)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
InstrumentationLibraryMetrics #0
InstrumentationLibrary  
Metric #0
Descriptor:
     -> Name: workload.googleapis.com/jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:38:25.913 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:38:25.913 +0000 UTC
Value: 0.000000
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:38:25.913 +0000 UTC
Value: 0.000000
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:38:25.913 +0000 UTC
Value: 0.000000
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:38:25.913 +0000 UTC
Value: 0.000000
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:38:25.913 +0000 UTC
Value: 0.000000
......

2021-12-09T22:44:40.955Z	info	service/collector.go:146	Received signal from OS	{"signal": "interrupt"}
2021-12-09T22:44:40.955Z	info	service/collector.go:242	Starting shutdown...
2021-12-09T22:44:40.955Z	info	service/service.go:121	Stopping receivers...
2021-12-09T22:44:40.978Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-09T22:44:40.978Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.version: STRING(1.7.0)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
InstrumentationLibraryMetrics #0
InstrumentationLibrary  
Metric #0
Descriptor:
     -> Name: workload.googleapis.com/jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 2420528
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 2686720
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 6660568
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 1966080
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 504768
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 597504
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-09 22:44:40.963 +0000 UTC
Value: 761120

AlmaLinux OpsAgent Installation

Trying to install Ops Agent on AlmaLinux/CloudLinux but getting:

Traceback (most recent call last):
  File "mass-provision-google-cloud-ops-agents.py", line 17, in <module>
    import dataclasses
ModuleNotFoundError: No module named 'dataclasses'

Aware of supported OS but often what works with Rocky Linux (and other binary-compatible distros) works with AlmaLinux.

Has anyone succeeded in getting this to work with AlmaLinux/CL?

postgresqlreceiver doesn't have all metrics

In ops-agent github doc, it says it has 7 metrics. While in the ops-agent 2.9.0, there are only 4 that are ingested.
image

The missing metrics are:

postgresql.blocks_read
postgresql.rows
postgresql.operations

I can see those metrics names came from:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.41.0/receiver/postgresqlreceiver/metadata.yaml
As i don't see we exclude any metrics, thus It might be upstream issue.

Also, we may want to test complete metrics name in https://github.com/GoogleCloudPlatform/ops-agent/blob/master/integration_test/third_party_apps_data/applications/postgresql/metric_name.txt @qingling128 @martijnvans

Integration tests require Metrics integration, fail with Logs only

Discovered in #213, the kokoro integration test scripts appear to require metric_name.txt to exist for integration tests for an application to pass.

2021/09/23 10:43:53 Attempt 1 of debian-10 test of apache finished with err=error finding metric name for apache: could not read metric_name.txt: open /tmpfs/src/github/unified_agents/integration_test/third_party_apps_data/applications/apache/metric_name.txt: no such file or directory, retryable=false
    third_party_apps_test.go:380: Non-retryable error: error finding metric name for apache: could not read metric_name.txt: open /tmpfs/src/github/unified_agents/integration_test/third_party_apps_data/applications/apache/metric_name.txt: no such file or directory

See log here: https://console.cloud.google.com/storage/browser/_details/ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/third_party_apps/presubmit_github/245/20210923-102914/logs/build_and_test.log;tab=live_object?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))

bug: hostmetrics receiver generating excessive log spam

Currently, the hostmetrics receiver generates a lot of log spam because of a permissions issue preventing the collector from accessing the path of a process owned by another user. An upstream bug has been reported at:

This has been marked as non-harmful log spam which is technically true, but is making logs very hard to use since these messages constitute a majority of my logs. I was wondering if it would be possible to set a filter for these types of logs at source? I am currently filtering out all logs from the opentelemetry agent to make my logs usable. This isn't ideal but feels like the best option until it can be addressed upstream.

I think it would be great this could be added to the Fluentbit config or if it would be possible to specify exclude filters for logs through configuration.

[FILTER]
    name    grep
    match    *
    exclude msg otelopscol

Since there isn't an easy way to specify the above filter as a configuration option, I'm manually adding the following filter to my rsyslog configuration.

/etc/rsyslog.d/99-exclude-otel.conf:

if ($programname contains "otelopscol") then {                   
   action(type="omfile" file="/var/log/dropped-msgs.log")
   stop
}

kafka test broken due to URL 404

I'm seeing this error. I think it happens across all distros:

+ sudo curl [https://dlcdn.apache.org/kafka/3.0.0/kafka_2.12-3.0.0.tgz](https://www.google.com/url?q=https://dlcdn.apache.org/kafka/3.0.0/kafka_2.12-3.0.0.tgz&sa=D) -o /opt/kafka/stage/kafka.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   196  100   196    0     0   2969      0 --:--:-- --:--:-- --:--:--  2969
+ sudo tar -xvzf /opt/kafka/stage/kafka.tgz -C /opt/kafka --strip 1

gzip: stdin: not in gzip format

That URL leads to a 404 for me, apparently curl doesn't care and it downloads the html for the 404 page...

docs: runtime service and account/permission requirements?

What are the base runtime requirements (google config, services) for this agent?

  • Does it require the default GCP service-account ? When yes, is there any minimal instruction to setup this for this agent ?
  • Does it need to run other google agents/services ?
  • Does it required preconfigured files on the filesystem ?

It should be possible to run this agent without any "default" large preconfigured VM image?

I found some issues here: #334

Couldn't set log file full path as receiver ID

I was trying to set a log file full path "/var/log/abc.log" as receiver ID in the config.ymal, then I got this following error:
[2022/03/03 22:57:39] [ warn] [output:stackdriver:stackdriver.0] error { "error": { "code": 400, "message": "Received unexpected value parsing name \"projects/qd-mar3/logs//var/log/abc.log\": abc.log. Expected the form projects/[PROJECT_ID]/logs/[ID]", "status": "INVALID_ARGUMENT" } }.
AWS CloudWatch Agent allows users to set file full path as the log stream name/ log name. I'm wondering if users can set log file full path as log name/ receiver ID with Ops Agent.

Fedora CoreOS installation experiment

I tried to build and install ops-agent on the currently not supported Fedora CoreOS (https://getfedora.org/en/coreos https://docs.fedoraproject.org/en-US/fedora-coreos ).

  • Does one need special installation requirements for google-api to allow run this agent?
  • Does it only require the default GCP service-account?
  • Does have someone a solution idea for the systemd-services errors?

Here are my log of this experiment:

GCP VM OS

Fedora CoreOS 35.20220103.3.0

ops-agent version

master: f516cab

Fedora 35 build commands:

FROM fedora:35 AS fedora35-build

RUN set -x; yum -y update && \
    dnf -y install 'dnf-command(config-manager)' && \
    yum -y install git systemd \
    autoconf libtool libcurl-devel libtool-ltdl-devel openssl-devel yajl-devel \
    gcc gcc-c++ make cmake bison flex file systemd-devel zlib-devel gtest-devel rpm-build systemd-rpm-macros \
    expect rpm-sign

ADD https://golang.org/dl/go1.17.linux-amd64.tar.gz /tmp/go1.17.linux-amd64.tar.gz
RUN set -xe; \
    tar -xf /tmp/go1.17.linux-amd64.tar.gz -C /usr/local

COPY . /work
WORKDIR /work

RPM spec

https://github.com/GoogleCloudPlatform/ops-agent/blob/master/pkg/rpm/google-cloud-ops-agent.spec

systemd services errors:

google-cloud-ops-agent:

The agent config file is not valid. Detailed error: failed to create directory for "conf/debug/built-in-config.yaml": mkdir conf: operation not permitted

google-cloud-ops-agent-opentelemetry-collector:

otelopscol[964]: application run finished with error: cannot build exporters: error creating googlecloud exporter: cannot configure Google Cloud metric exporter: stackdriver: no project found with application default credentials
otelopscol[964]: Error: cannot build exporters: error creating googlecloud exporter: cannot configure Google Cloud metric exporter: stackdriver: no project found with application default credentials

google-cloud-ops-agent-fluent-bit:

fluent-bit[929]: [error] [lib] backend failed
fluent-bit[929]: * https://fluentbit.io
fluent-bit[929]: * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
fluent-bit[929]: * Copyright (C) 2015-2018 Treasure Data
fluent-bit[929]: * Copyright (C) 2019-2021 The Fluent Bit Authors
fluent-bit[929]: Fluent Bit v1.8.11

GCP VM Terraform resource

resource "google_compute_instance" "main" {
  name = var.instance_name
  zone         = var.zone
  machine_type = var.machine_type

  metadata = {
    user-data = data.ct_config.main-ignition.rendered
    ssh-keys = "[insert yours here]"
    serial-port-enable = "TRUE"
  }

  service_account {
    email  = "[insert yours here]"
    scopes = ["cloud-platform"]
  }

  boot_disk {
    auto_delete = true

    initialize_params {
      image = data.google_compute_image.fedora-coreos.self_link
      size  = var.disk_size
    }
  }

  network_interface {
    network = google_compute_network.network.name
  }

  can_ip_forward = true
  tags           = [var.instance_name, "http-server", "https-server"]

}

Partial ignitionfile

variant: fcos
version: 1.2.0
storage:
  files:
    - path: /etc/google-cloud-ops-agent/config.yaml
      mode: 0600
      contents:
        inline: |
          logging:
            receivers:
              syslog:
                type: systemd_journald
            service:
              pipelines:
                default_pipeline:
                  receivers: [syslog]
          metrics:
            receivers:
              hostmetrics:
                type: hostmetrics
                collection_interval: 60s
            processors:
              metrics_filter:
                type: exclude_metrics
                metrics_pattern: []
            service:
              pipelines:
                default_pipeline:
                  receivers: [hostmetrics]
                  processors: [metrics_filter]

Debian 11 (Bullseye) Support

On installation using the guide it fails to find an apt release for Debian Bullseye as Debian 11 is yet to be an officially supported OS.

Is there a specific blocker or workaround for this? We're deploying new machines and Compute has the option for 11 without providing a custom image now it's stable.

Should we revert back to 10 for the time being?

Err:9 https://packages.cloud.google.com/apt google-cloud-ops-agent-bullseye-all Release
  404  Not Found [IP: 142.250.187.238 443]
Reading package lists... Done
E: The repository 'https://packages.cloud.google.com/apt google-cloud-ops-agent-bullseye-all Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
[2021-09-11T23:31:45+0000] Could not refresh the google-cloud-ops-agent apt repositories.
Please check your network connectivity and make sure you are running a supported
debian distribution. See https://cloud.google.com/stackdriver/docs/solutions/ops-agent/#supported_operating_systems
for a list of supported platforms.

apache fail on debian11

apache

The userAgent changed from curl/7.64.0 to curl/7.74.0, we shall not match exactly. Shall change this line to match regex

{
  "insertId": "1fvx3cxg1rlw5sv",
  "jsonPayload": {},
  "httpRequest": {
    "requestMethod": "GET",
    "requestUrl": "/forbidden.html",
    "status": 403,
    "responseSize": "435",
    "userAgent": "curl/7.74.0",
    "remoteIp": "::1",
    "protocol": "HTTP/1.1"
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "project_id": "stackdriver-test-143416",
      "zone": "us-central1-b",
      "instance_id": "257764733175503475"
    }
  },
  "timestamp": "2022-03-03T20:29:18Z",
  "logName": "projects/stackdriver-test-143416/logs/apache_access",
  "receiveTimestamp": "2022-03-03T20:29:19.315357676Z"
}

Logging doesn’t support renaming of values

When parsing input data like {"severity": "warn"} the value can’t be renamed to “WARNING” for GCP Logging to properly handle it. As in #245 I propose to implement fluentbit’s Modify filter and add a filters element to the configuration model, with that it’s as easy as:

condition key_value_equals severity warn
set severity WARNING

logName in Stackdriver to contain file name of log

We're trying to configure the Ops Agent to include the file name of the log in the logName field in Stackdriver Logging.

For example, we want to read any logs in /var/log/webapps/*/*.log, where the first wildcard is the domain, the second wildcard denotes the type of log entry (it's a legacy system so can't really swap it for JSON).

This worked great with the old logging agent, as it would ingest and include the file name in the logName field:

image

We can't figure out a way to replicate this in the new Ops Agent, although StackOverflow points to a Path_Key in the native Fluent-bit config.

We'll update here if we figure it out, but in the meantime, anyone know if this is actually supported?

Logging doesn’t support renaming of keys

When parsing input data like

{
  "log": "some message",
  "time": "some timestamp"
}

with the parse_json processor, the configuration model doesn’t allow for renaming a key like “log” to “message” for GCP Logging to understand it, instead the resulting Logs Explorer entry is literal {"log":"some.message"}.
One could simply implement a message_key option as for the timestamp, but that’s rather Byzantine and useless for other or custom keys. Instead I propose to implement fluentbit’s Modify filter and add a filters element to the configuration model as in #246.

2.11.0 ubuntu - error writing data from tail - log filled up the disk

Ref fluent/fluent-bit#4766

Happened the same on Ubuntu 20.04.4, with 2 VMs

example of one of them:

google-cloud-ops-agent version 2.11.0~ubuntu20.04

huge file log was created:

ls -alh /var/log/google-cloud-ops-agent/subagents/logging-module.log
-rw-r--r-- 1 root root 117G Mar  7 14:52 /var/log/google-cloud-ops-agent/subagents/logging-module.log

image

Inside the log file:

[2022/03/05 11:48:03] [error] [storage] format check failed: tail.2/870-1646420538.862293556.flb
[2022/03/05 11:48:03] [error] [storage] format check failed: tail.2/870-1646420538.862293556.flb
[2022/03/05 11:48:03] [error] [storage] [cio file] file is not mmap()ed: tail.2:870-1646420538.862293556.flb
[2022/03/05 11:48:03] [error] [input chunk] error writing data from tail.2 instance
[2022/03/05 11:48:03] [ info] [task] re-schedule retry=0x7fea501439e0 77 in the next 7 seconds
[2022/03/05 11:48:03] [ info] [task] re-schedule retry=0x7fea500b2160 51 in the next 11 seconds
[2022/03/05 11:48:04] [ info] [task] re-schedule retry=0x7fea501538d0 79 in the next 7 seconds
[2022/03/05 11:48:04] [ info] [task] re-schedule retry=0x7fea50150900 84 in the next 21 seconds
[2022/03/05 11:48:04] [ info] [task] re-schedule retry=0x7fea50146900 33 in the next 15 seconds
[2022/03/05 11:48:04] [error] [storage] format check failed: tail.2/870-1646420538.862293556.flb
[2022/03/05 11:48:04] [error] [storage] format check failed: tail.2/870-1646420538.862293556.flb
[2022/03/05 11:48:04] [error] [storage] [cio file] file is not mmap()ed: tail.2:870-1646420538.862293556.flb
[2022/03/05 11:48:04] [error] [input chunk] error writing data from tail.2 instance
[2022/03/05 11:48:04] [ info] [task] re-schedule retry=0x7fea5013f9c0 76 in the next 19 seconds
[2022/03/05 11:48:04] [error] [storage] format check failed: tail.2/870-1646420538.862293556.flb
[2022/03/05 11:48:04] [error] [storage] format check failed: tail.2/870-1646420538.862293556.flb
[2022/03/05 11:48:04] [error] [storage] [cio file] file is not mmap()ed: tail.2:870-1646420538.862293556.flb
[2022/03/05 11:48:04] [error] [input chunk] error writing data from tail.2 instance
[2022/03/05 11:48:05] [error] [storage] format check failed: tail.1/870-1646420636.219474538.flb
[2022/03/05 11:48:05] [error] [storage] format check failed: tail.1/870-1646420636.219474538.flb
[2022/03/05 11:48:05] [error] [storage] [cio file] file is not mmap()ed: tail.1:870-1646420636.219474538.flb
[2022/03/05 11:48:05] [error] [input chunk] error writing data from tail.1 instance

Same scenario with the second VM, disk filled up with this logs in 7 hours:
image

Any suggestions?

jvm metrics not respect scraping interval

Description
JVM metrics is sending out at a frequency every 5 seconds resulting in an error denied by Cloud Monitoring API side:

2021-12-17T01:44:55.104Z	info	exporterhelper/queued_retry.go:215	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: timeSeries[0-7]\nerror details: name = Unknown  desc = total_point_count:8  errors:{status:{code:9}  point_count:8}", "interval": "5.52330144s"}

** To Reproduce**

sophieyfang_google_com@debian10-meow:~$ cat otel.yaml 
exporters:
  googlecloud:
    metric:
      prefix: ""
    user_agent: Google-Cloud-Ops-Agent-Metrics/latest (BuildDistro=build_distro;Platform=linux;ShortName=linux_platform;ShortVersion=linux_platform_version)
  logging:
    loglevel: debug
processors:
  filter/jvm:
    metrics:
      include:
        match_type: strict
        metric_names:
        - jvm.memory.pool.used
  resourcedetection/_global_0:
    detectors:
    - gce
  metricstransform/jvm__pipeline_jvm__metrics_1:
    transforms:
    - action: update
      include: ^(.*)$$
      match_type: regexp
      new_name: workload.googleapis.com/$${1}
  normalizesums/jvm__pipeline_jvm__metrics_0: {}
receivers:
  jmx/jvm__pipeline_jvm__metrics:
    collection_interval: 30s
    endpoint: service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi
    jar_path: /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/opentelemetry-java-contrib-jmx-metrics.jar
    target_system: jvm
service:
  pipelines:
    metrics/jvm__pipeline_jvm__metrics:
      exporters:
      - googlecloud
      - logging
      processors:
      - filter/jvm
      receivers:
      - jmx/jvm__pipeline_jvm__metrics
(reverse-i-search)`': cat otel.yaml ^C
sophieyfang_google_com@debian10-meow:~$ 
sophieyfang_google_com@debian10-meow:~$ /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --add-instance-id=false --config=otel.yaml
2021-12-17T01:43:47.870Z	info	service/collector.go:174	Applying configuration...
2021-12-17T01:43:47.878Z	info	builder/exporters_builder.go:259	Exporter was built.	{"kind": "exporter", "name": "googlecloud"}
2021-12-17T01:43:47.878Z	info	builder/exporters_builder.go:259	Exporter was built.	{"kind": "exporter", "name": "logging"}
2021-12-17T01:43:47.879Z	info	[email protected]/filter_processor.go:73	Metric filter configured	{"kind": "processor", "name": "filter/jvm", "include match_type": "strict", "include expressions": [], "include metric names": ["jvm.memory.pool.used"], "include metrics with resource attributes": null, "exclude match_type": "", "exclude expressions": [], "exclude metric names": [], "exclude metrics with resource attributes": null}
2021-12-17T01:43:47.879Z	info	builder/pipelines_builder.go:220	Pipeline was built.	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-17T01:43:47.879Z	info	builder/receivers_builder.go:228	Receiver was built.	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics", "datatype": "metrics"}
2021-12-17T01:43:47.880Z	info	service/service.go:86	Starting extensions...
2021-12-17T01:43:47.880Z	info	service/service.go:91	Starting exporters...
2021-12-17T01:43:47.880Z	info	builder/exporters_builder.go:40	Exporter is starting...	{"kind": "exporter", "name": "googlecloud"}
2021-12-17T01:43:47.882Z	info	builder/exporters_builder.go:48	Exporter started.	{"kind": "exporter", "name": "googlecloud"}
2021-12-17T01:43:47.882Z	info	builder/exporters_builder.go:40	Exporter is starting...	{"kind": "exporter", "name": "logging"}
2021-12-17T01:43:47.882Z	info	builder/exporters_builder.go:48	Exporter started.	{"kind": "exporter", "name": "logging"}
2021-12-17T01:43:47.882Z	info	service/service.go:96	Starting processors...
2021-12-17T01:43:47.882Z	info	builder/pipelines_builder.go:52	Pipeline is starting...	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-17T01:43:47.883Z	info	builder/pipelines_builder.go:63	Pipeline is started.	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-17T01:43:47.883Z	info	service/service.go:101	Starting receivers...
2021-12-17T01:43:47.883Z	info	builder/receivers_builder.go:68	Receiver is starting...	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics"}
2021-12-17T01:43:47.884Z	info	otlpreceiver/otlp.go:68	Starting GRPC server on endpoint 0.0.0.0:40973	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics"}
2021-12-17T01:43:47.885Z	info	builder/receivers_builder.go:73	Receiver started.	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics"}
2021-12-17T01:43:47.885Z	info	service/telemetry.go:92	Setting up own telemetry...
2021-12-17T01:43:47.886Z	info	service/telemetry.go:116	Serving Prometheus metrics	{"address": ":8888", "level": "basic", "service.instance.id": "", "service.version": "latest"}
2021-12-17T01:43:47.886Z	info	service/collector.go:230	Starting google-cloud-metrics-agent...	{"Version": "latest", "NumCPU": 1}
2021-12-17T01:43:47.886Z	info	service/collector.go:132	Everything is ready. Begin running and processing data.
2021-12-17T01:43:52.897Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-17T01:43:52.900Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary io.opentelemetry.contrib.jmxmetrics 1.6.0
Metric #0
Descriptor:
     -> Name: jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 16259944
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 4356992
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 7129432
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 20392
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 2503816
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 1880320
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:43:52.319 +0000 UTC
Value: 770600

2021-12-17T01:44:18.838Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-17T01:44:18.838Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary io.opentelemetry.contrib.jmxmetrics 1.6.0
Metric #0
Descriptor:
     -> Name: jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 16259944
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 4356992
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 7129432
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 20392
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 2503816
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 1880320
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:18.829 +0000 UTC
Value: 770600

2021-12-17T01:44:23.043Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-17T01:44:23.043Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary io.opentelemetry.contrib.jmxmetrics 1.6.0
Metric #0
Descriptor:
     -> Name: jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 497824
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 4361856
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 7129432
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 47816
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 2503816
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 1880320
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:23.033 +0000 UTC
Value: 770600

2021-12-17T01:44:48.835Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-17T01:44:48.835Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary io.opentelemetry.contrib.jmxmetrics 1.6.0
Metric #0
Descriptor:
     -> Name: jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 497824
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 4361856
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 7129432
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 47816
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 2503816
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 1880320
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:48.828 +0000 UTC
Value: 770600

2021-12-17T01:44:53.144Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-17T01:44:53.144Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary io.opentelemetry.contrib.jmxmetrics 1.6.0
Metric #0
Descriptor:
     -> Name: jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 954968
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 4361856
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 7129432
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 47816
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 2503816
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 1880320
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:53.136 +0000 UTC
Value: 770600

^C2021-12-17T01:44:55.042Z	info	service/collector.go:146	Received signal from OS	{"signal": "interrupt"}
2021-12-17T01:44:55.042Z	info	service/collector.go:242	Starting shutdown...
2021-12-17T01:44:55.042Z	info	service/service.go:121	Stopping receivers...
2021-12-17T01:44:55.069Z	INFO	loggingexporter/logging_exporter.go:56	MetricsExporter	{"#metrics": 1}
2021-12-17T01:44:55.070Z	DEBUG	loggingexporter/logging_exporter.go:66	ResourceMetrics #0
Resource labels:
     -> service.name: STRING(unknown_service:java)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.0)
InstrumentationLibraryMetrics #0
InstrumentationLibrary io.opentelemetry.contrib.jmxmetrics 1.6.0
Metric #0
Descriptor:
     -> Name: jvm.memory.pool.used
     -> Description: current memory pool usage
     -> Unit: by
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> name: STRING(Eden Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 954968
NumberDataPoints #1
Data point attributes:
     -> name: STRING(CodeHeap 'profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 4361856
NumberDataPoints #2
Data point attributes:
     -> name: STRING(Metaspace)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 7129432
NumberDataPoints #3
Data point attributes:
     -> name: STRING(Survivor Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 47816
NumberDataPoints #4
Data point attributes:
     -> name: STRING(Tenured Gen)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 2503816
NumberDataPoints #5
Data point attributes:
     -> name: STRING(CodeHeap 'non-nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 1212928
NumberDataPoints #6
Data point attributes:
     -> name: STRING(CodeHeap 'non-profiled nmethods')
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 1880320
NumberDataPoints #7
Data point attributes:
     -> name: STRING(Compressed Class Space)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2021-12-17 01:44:55.053 +0000 UTC
Value: 770600

2021-12-17T01:44:55.089Z	error	subprocess/subprocess.go:249	subprocess died	{"kind": "receiver", "name": "jmx/jvm__pipeline_jvm__metrics", "error": "unexpected shutdown: exit status 130"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jmxreceiver/internal/subprocess.(*Subprocess).run
	/root/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/internal/subprocess/subprocess.go:249
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jmxreceiver/internal/subprocess.(*Subprocess).Start.func1
	/root/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/internal/subprocess/subprocess.go:127
2021-12-17T01:44:55.089Z	info	service/service.go:126	Stopping processors...
2021-12-17T01:44:55.089Z	info	builder/pipelines_builder.go:71	Pipeline is shutting down...	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-17T01:44:55.089Z	info	builder/pipelines_builder.go:75	Pipeline is shutdown.	{"pipeline_name": "metrics/jvm__pipeline_jvm__metrics", "pipeline_datatype": "metrics"}
2021-12-17T01:44:55.089Z	info	service/service.go:131	Stopping exporters...
2021-12-17T01:44:55.104Z	info	exporterhelper/queued_retry.go:215	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: timeSeries[0-7]\nerror details: name = Unknown  desc = total_point_count:8  errors:{status:{code:9}  point_count:8}", "interval": "5.52330144s"}
2021-12-17T01:44:55.105Z	info	service/service.go:136	Stopping extensions...
2021-12-17T01:44:55.106Z	info	service/collector.go:258


missing metrics: api_request_count

We lost GA metrics api_request_count since ops-agent version > 2.4.0 and some customers are asking for this metrics back. Details about the metrics: https://cloud.google.com/monitoring/api/metrics_opsagent#agent-agent

This PR open-telemetry/opentelemetry-collector@fb95c88 seems the one that breaks our agent api_request_count. ocgrpc is replaced with otelgrpc which doesn't support metrics: image as far as i can tell from this doc: https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation#instrumentation-packages

Unable to split syslog to different to different logger name

Hi,
I'm trying to update google-fluentd-agent with this new one for logging on custom rhel VM into cloud logging

logs are send with rsyslog to a local socket that was managed by osp-agent

cat /etc/google-cloud-ops-agent/config.yaml logging: receivers: syslog: type: syslog transport_protocol: udp listen_host: 127.0.0.1 listen_port: 1514 service: pipelines: default_pipeline: receivers: [syslog]

The old agent configuration allow us to split out of the box the logs into different target log_name based on facility and severity

{
insertId: "irv8zjee248rhz8o4"
labels: {
compute.googleapis.com/resource_name: "XXXXXXXXXXXXXXXXX"
}
logName: "projects/pXXXXXXXXXXXXXXXXX/logs/syslog.user.notice"
receiveTimestamp: "2022-01-26T15:11:41.338528447Z"
resource: {
labels: {
instance_id: "1063345352058620069"
project_id: "XXXXXXXXXXXXXXXXX"
zone: "europe-west3-b"
}
type: "gce_instance"
}
textPayload: "Jan 26 16:11:37 XXXXXXXXXXXXXXXXX subscription-manager[2592720]: Added subscription for 'Content Access' contract 'None'"
timestamp: "2022-01-26T15:11:37.662227345Z"
}

The new agent is able to send all data but messages lose information about facility and use a default log_name (syslog) in cloud logging

{
insertId: "e5fwsug1qxxo7v"
jsonPayload: {
message: "<14>Jan 26 14:49:32 YYYYYYYYYYYYYYY root[285447]: testmessagetakke"
}
logName: "projects/YYYYYYYYYYYYYYY /logs/syslog"
receiveTimestamp: "2022-01-26T13:49:33.487331005Z"
resource: {
labels: {
instance_id: "1412765332707494911"
project_id: "YYYYYYYYYYYYYYY "
zone: "europe-west3-a"
}
type: "gce_instance"
}
timestamp: "2022-01-26T13:49:32.854706758Z"
}

Is there a way to split them?

Improve test coverage for optional metrics

There is an opportunity to improve the test coverage for optional metrics introduced in #456. The current behaviour is that metrics with optional: true are completely skipped during the test. It would be better if we could take a best-effort approach to validate any optional metrics that happen to show up during the test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.