Coder Social home page Coder Social logo

aws-observability / aws-otel-java-instrumentation Goto Github PK

View Code? Open in Web Editor NEW
68.0 19.0 49.0 1.79 MB

AWS Distro for OpenTelemetry Java Instrumentation Library

Home Page: https://aws-otel.github.io/

License: Apache License 2.0

Kotlin 0.24% Java 97.62% Dockerfile 0.51% Rust 1.63%
observability opensource opentelemetry opentelemetry-api

aws-otel-java-instrumentation's Introduction

AWS Distro for OpenTelemetry - Instrumentation for Java

Introduction

This project is a redistribution of the OpenTelemetry Agent for Java, preconfigured for use with AWS services. Please check out that project too to get a better understanding of the underlying internals. You won't see much code in this repository since we only apply some small configuration changes, and our OpenTelemetry friends takes care of the rest.

We provided a Java agent JAR that can be attached to any Java 8+ application and dynamically injects bytecode to capture telemetry from a number of popular libraries and frameworks. The telemetry data can be exported in a variety of formats. In addition, the agent and exporter can be configured via command line arguments or environment variables. The net result is the ability to gather telemetry data from a Java application without any code changes.

Getting Started

Check out the getting started documentation.

Supported Java libraries and frameworks

For the complete list of supported frameworks, please refer to the OpenTelemetry for Java documentation.

How it works

The OpenTelemetry Java SDK provides knobs for configuring aspects using Java SPI. This configuration includes being able to reconfigure the IdsGenerator which we need to support X-Ray compatible trace IDs. Because the SDK uses SPI, it is sufficient for the custom implementation to be on the classpath to be recognized. The AWS distribution of the OpenTelemetry Java Agent repackages the upstream agent by simply adding our SPI implementation for reconfiguring the ID generator. In addition, it includes AWS resource providers by default, and it sets a system property to configure the agent to use multiple trace ID propagators, defaulting to maximum interoperability.

Other than that, the distribution is identical to the upstream agent and all configuration can be used as is.

Standardized Sample Applications

In addition to the sample apps in this repository, there are also a set of standardized sample applications that can be used. You can find the standardized Java sample app here.

Support

Please note that as per policy, we're providing support via GitHub on a best effort basis. However, if you have AWS Enterprise Support you can create a ticket and we will provide direct support within the respective SLAs.

Security issue notifications

If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our vulnerability reporting page. Please do not create a public github issue.

aws-otel-java-instrumentation's People

Contributors

anuraaga avatar asakermohd avatar asmadsen avatar atshaw43 avatar awssandra avatar bhautikpip avatar bjrara avatar bluelort avatar bryan-aguilar avatar danielzolty avatar dependabot[bot] avatar harrryr avatar humivo avatar jerry-shao avatar jinwoov avatar majanjua-amzn avatar mxiamxia avatar nathanielrn avatar paurushgarg avatar rapphil avatar ruthvik17 avatar srprash avatar thpierce avatar upsidedownsmile avatar vasireddy99 avatar wangzlei avatar willarmiros avatar wytrivail avatar xinranzhaws avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-otel-java-instrumentation's Issues

Performance Threshold breached AFTER Soak Tests completed for the (springboot, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit d1b010b of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

Performance Threshold breached AFTER Soak Tests completed for the (spark-awssdkv1, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit 1e5ce35 of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

Support for SQS

Is your feature request related to a problem? Please describe.
So actually I have a question, is SQS supported by the agent fully?
Because currently I have a queue between my services, and by using the java agent, the service map is representing only one SQS queue, without any in or outflow from it to my actual services. while in the pet clinic sample app or other places, we have a nice representation which shows the queue exactly between the two services which pushing or receiving msg from it.

Describe the solution you'd like
Support for the SQS and be able to have the same visualization and representation of sqs.

Thanks in advance

Performance Threshold breached DURING Soak Tests execution for the (springboot, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

Performance Threshold breached DURING Soak Tests execution for the (springboot, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

More "intelligent" sampler option(s)

Is your feature request related to a problem? Please describe.
We've just migrated from AppD (javaagent) to X-Ray using the OTEL agent, v1.1. We're currently using traceidratio sampling and a 1/200 ratio.

This causes high frequency service calls to be over-represented in the sample set and low frequency services to be under-represented, or simply not present at all.

Describe the solution you'd like
I'd like a more sophisticated sampler that will collect data from all spans every time period, regardless of traffic.

See jaegertracing/jaeger#365 for problem description and possible inspiration.

It seems like a common and basic need for instrumentation? Is anything planned upstream?

I should also add that we're running a monolitic application, so we can't configure each service with a different sample rate, as they are running in the same JVM / javaagent. But even in a micro service architecture, this problem would exist.
We do use Spring AOP to create custom spans for the Spring @Service business services.

I'd rather not have to roll my own OTEL java agent to address this issue... Are there any ways to address this from user-space?

Issue with serviceLens map and the services based on EKS [SCALA]

Describe the bug
We are using EKS with several namespaces. The problem is that the services are not getting recognized correctly and I think the proper attributes are not getting add to the traces correctly.
I'm using ADOT java agent(1.1.0) + ADOT agent as DaemonSet (v0.11.0) + ADOT collector (v0.11.0)

firstly while all of the services have different names and service.name attribute, they are not getting represented, just the connection from those services are getting represented, for example, the connection from the service to the S3 or documentDB. like below:
client ---> S3
while the expected behaviour is like, client ---> my sevice(pod) name---> S3

Secondly, I expected to have some extra information and attributes by using ADOT java agent or ADOT collector which seems not working, for instance as I was seeing that the pods are getting considered as EC2 instance, I tried to add the k8s.cluster.name=the-cluster-name manually to the java agent. This leads to having some changes in the representation as follows:

without setting manually the k8s cluster attribute:
EKS_without_tag

After adding the attribute: (just seems that the representation icon changed and it becomes double as I just added this attribute to one of namespaces which this image was running on it, so other namespaces are still without attribute and represented as single EC2 instance (The same image running in different namespaces))
EKS_with_tag
cluster-tag-in-dashboard

I thought maybe it's due to the fact that i'm running the collector as DaemonSet, not Sidecar, Hence I tested it with sidecar but didn't improve the situation. I would be glad to have your idea about these issues, while I'm trying to solve them.

I created this issue as it works as documentation as well, maybe I'm missing some parts or there are some compulsory attributes that should be set in case of using multi namespace based on EKS manually. in any case i will try to keep my opened issues as update as possible to my latest findings.

thank you in advance for your help.

Performance Threshold breached AFTER Soak Tests completed for the (spark-awssdkv1, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit 1e5ce35 of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

Very long "cold run" with AWS OT Java Agent

Describe the bug
It takes very long time to run the AWS Lambda Java function instrumented by AWS OTEL Java Agent. In case of very simple function which makes only HTTP GET requests it takes around 40-45 seconds to run the code. (cold run) and it is causing some Lambda HTTP response timeouts - later it's smooth.

Steps to reproduce
aws-opentelemetry-agent.jar set as -javaagent
lambda function code

package example;

import static java.util.stream.Collectors.toMap;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyRequestEvent;
import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyResponseEvent;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClients;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.Map;
import java.util.concurrent.TimeUnit;

public class HelloLambdaHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {

    @Override
    public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent request, Context context) {
        APIGatewayProxyResponseEvent response = new APIGatewayProxyResponseEvent();
        response.setStatusCode(200);
        try {
            response.setHeaders(request.getHeaders().entrySet()
                                       .stream()
                                       .map(e -> Map.entry("received-" + e.getKey(), e.getValue()))
                                       .collect(toMap(Map.Entry::getKey, Map.Entry::getValue)));
        } catch (Exception e) {}

        HttpClient httpclient = HttpClients.createDefault();
        for(int i = 0; i < 3; i++) {
            try {
                HttpResponse httpResponse = httpclient.execute(new HttpGet("http://httpbin.org/"));
                httpResponse.getEntity().getContent().readAllBytes();
            } catch (Exception e) {
            }
        }

        Throwable t = new Throwable();
        StringWriter writer = new StringWriter();
        t.printStackTrace(new PrintWriter(writer));
        response.setBody("I'm lambda!\n" + writer.toString());

        return response;
    }
}

What did you expect to see?
I expect to start the Lambda much faster. In case of OpenTelemetry Java Agent it is 9 seconds

What did you see instead?
image

Additional context

Create release-notes folder

In order to better structure our release history, we should create a running release notes folder. This would allow us to:

  • Better structure of our release history by creating release-specific change log file.

  • Better focus on release changes, as we can easily reference specific time-stamped release notes files. Also, a less "monolith" document creates more logical space to work with to allow specific examples and comments in a reader-friendly structure.

  • Easier to maintain with new files per commit.

  • Can be easily used in conjunction with a change log generator tool to compile a skeleton that includes all PRs and commits.

AWS Open Source advises tools such as the github-changelog-generator, but we can discuss options :)

Performance Threshold breached DURING Soak Tests execution for the (spark-awssdkv1, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

Performance Threshold breached AFTER Soak Tests completed for the (spark, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit df05909 of the refs/heads/main branch for the (spark, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

Performance Threshold breached DURING Soak Tests execution for the (springboot, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit e7dc6a7 of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

Performance Threshold breached DURING Soak Tests execution for the (springboot, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit a49aa48 of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

Version 1.5.0 throws NoSuchMethodError

Describe the bug
We updated from 1.4.1 of the AWS flavoured agent to 1.5.0 and start to see exceptions in the log like this

Caused by: java.lang.NoSuchMethodError: 'io.opentelemetry.sdk.resources.Resource io.opentelemetry.sdk.autoconfigure.OpenTelemetrySdkAutoConfiguration.getResource()'
--
  |   | at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
  |   | at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
  |   | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  |   | at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  |   | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  |   | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  |   | at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.premain(AwsAgentBootstrap.java:24)
  |   | at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.agentmain(AwsAgentBootstrap.java:28)
  |   | at io.opentelemetry.javaagent.OpenTelemetryAgent.agentmain(OpenTelemetryAgent.java:51)
  |   | at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:40)
  |   | at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  |   | at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  |   | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  |   | at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Additional context
We also updated the opentelemtry bom version to 1.5.0 as suggested but we're only using the aws extensions for injecting the trace ID into our HTTP responses.

Any ideas whats going on?
Our application works great with aws agent 1.4.1 and open telemetry 1.4.0

AWS OTEL java agent 1.7 giving "java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0"

Hello, Anyone else seeing this error with AWS OTEL java agent 1.7 (aws-opentelemetry-agent.jar) and it does not appear with java agent 1.6.

Stacktrace below:
java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:372)
at java.base/java.util.ArrayList.set(ArrayList.java:473)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpClientRequest.headersToList(ApacheHttpClientRequest.java:51)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpClientRequest.getHeader(ApacheHttpClientRequest.java:41)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpAsyncClientHttpAttributesExtractor.requestHeader(ApacheHttpAsyncClientHttpAttributesExtractor.java:31)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpAsyncClientHttpAttributesExtractor.requestHeader(ApacheHttpAsyncClientHttpAttributesExtractor.java:16)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.http.HttpCommonAttributesExtractor.userAgent(HttpCommonAttributesExtractor.java:92)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.http.HttpCommonAttributesExtractor.onStart(HttpCommonAttributesExtractor.java:36)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.http.HttpClientAttributesExtractor.onStart(HttpClientAttributesExtractor.java:44)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.Instrumenter.start(Instrumenter.java:147)
at io.opentelemetry.javaagent.shaded.instrumentation.api.instrumenter.ClientInstrumenter.start(ClientInstrumenter.java:26)
at io.opentelemetry.javaagent.instrumentation.apachehttpasyncclient.ApacheHttpAsyncClientInstrumentation$DelegatingRequestProducer.generateRequest(ApacheHttpAsyncClientInstrumentation.java:109)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.start(DefaultClientExchangeHandlerImpl.java:123)
at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:141)
at com.livevox.proxy.core.GenericRequestProcessor.forward(GenericRequestProcessor.java:145)
at com.livevox.proxy.core.GenericRequestProcessor.run(GenericRequestProcessor.java:90)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Upgrade to version 0.10.1

Are there plans to upgrade to open-telemetry agent v. 0.10.1 anytime soon? I need it to fix open-telemetry/opentelemetry-java/#2052 Thank you!

AWS-XRAY-TRACE-ID missing in augmented MDC

My issue looks exactly like #66 but we're using version 1.5.1/1.6.0

{
  "@timestamp": "2021-09-23T12:03:26.518+02:00",
  "@version": "1",
  "message": "my message",
  "logger_name": "myloggerr",
  "thread_name": "grpc-default-executor-11",
  "level": "TRACE",
  "level_value": 5000,
  "trace_id": "614c50d5522ead27e473e03f06c469c5",
  "trace_flags": "01",
  "span_id": "ab12a7b5a9a0001c"
}
 

If I use version 1.4.1 its there

{
  "@timestamp": "2021-09-23T12:11:15.644+02:00",
  "@version": "1",
  "message": "mymessage",
  "logger_name": "mylogger",
  "thread_name": "grpc-default-executor-7",
  "level": "TRACE",
  "level_value": 5000,
  "trace_id": "614c52c132ce415b4ab8604020840209",
  "trace_flags": "01",
  "span_id": "c5e5f7bf633a1f60",
  "AWS-XRAY-TRACE-ID": "1-614c52c1-32ce415b4ab8604020840209@c5e5f7bf633a1f60"
}

Seems to be a regression

Trace id added to MDC is misleading

Hello,

Describe the bug
We use the java agent with one of our spring boot apps. The agent adds a traceId like 60108d573a343aec0c7a35f4ede1e064 to our log output but it should have been something like 1-60108d57-3a343aec0c7a35f4ede1e064.

What did you expect to see?
Maybe the proper traceId copy&paste'ble to Xray? Im not quite sure what is the expected behavior here.

Additional context
v.0.6.0 sidecar and v0.12.1-aws.1 agent

Is this expected/work in progress or do I miss something?

Thanks

Performance Threshold breached AFTER Soak Tests completed for the (spark-awssdkv1, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit df05909 of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

Performance/Throughput Impact with auto instrumentation in Spring 5 applications

Describe the bug
We are seeing more than 50% performance degradation with instrumenting otel agents, Our application instrumented with otel runs on EKS cluster. OTel Collector running as daemon set in the same EKS cluster collects traces and ingest data to AWS Xray.

Steps to reproduce
This is Spring 5 project with webflux and spring cloud stream support interacting with SQS, DynamoDB and AWS MSK

What did you expect to see?
Without Otel Agent, application could reach upto 250 request per second with 2Gi memory.

What did you see instead?
After OTel agent, we are seeing ~65 request per second with same settings, I was expecting some degradation in the throughput but this is more 50%

Additional context
We are using aws-opentelemetry-agent-1.1.0 with default settings for BSP and sampling is set to 100% and metrics exporter is set to logging.

Performance Threshold breached AFTER Soak Tests completed for the (springboot, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit df05909 of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

Performance Threshold breached AFTER Soak Tests completed for the (spark-awssdkv1, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit c258373 of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

No trace_id in logs with log4j2 + Spring Boot

Describe the bug
The trace_id, span_id and trace_flags are not logged with log4j2 + spring boot.

With the javaagent provided from this repository (1.6.0)

  • log4j2: no logging context set at all
  • logback: log context set properly

With the javaagent from open-telemetry (1.6.2)

  • log4j2: logging in place
  • logback: logging in place

Steps to reproduce
Here is a repository that reproduces the described issue.
Note, there are no tests in the repository. But the application logs show the faulty behaviour.

What did you expect to see?

  • The aws otel javaagent should behave the same way as it does with the original distribution.

What did you see instead?

  • The aws otel javaagent is unable to set the correct log4j2 context.

Additional context
The same application works when using the native javaagent from the open-telemetry org.
image
gradle-q-dependencis.txt

Performance Threshold breached DURING Soak Tests execution for the (spark-awssdkv1, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit 5d89f4c of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

Not logging trace id after AWS Auto Instrumentation Agent upgrade to 1.2.0 or 1.4.0

Describe the bug
We are tracing our SpringBoot/SpringCloudGateway application by the AWS Auto Instrumentation Agent. After I change the release from Release v1.1.0 to Release v1.4.0, the traceId shows empty, and the key AWS-XRAY-TRACE-ID in MDC context map has also disappeared.

Steps to reproduce
java -Dspring.profiles.active=local -javaagent:/aotel/aws-opentelemetry-agent-1.4.0.jar one.jar
"one.jar" is a SpringBoot application, configuring by logback pattern:
%d{yyyy-MM-dd HH:mm:ss.SSS}\t[%X{AWS-XRAY-TRACE-ID}] - %msg%n

What did you expect to see?
Before upgrade, using agent 1.1.0:
java -Dspring.profiles.active=local -javaagent:/aotel/aws-opentelemetry-agent-1.1.0.jar one.jar
the log is showing like this:
2021-07-29 18:50:09.618 [1-610287e1-4bad020c021ab9b074d04217@a01edea91aca582c] origin request body: ...

What did you see instead?
After upgrade to agent 1.2.0 or 1.4.0, logging like this:
2021-07-29 18:50:09.618 [] origin request body: ...
There are none AWS-XRAY-TRACE-ID in MDC context by debug mode:
"mdc": { "appVersion": "", "interface io.opentelemetry.javaagent.shaded.io.opentelemetry.api.trace.Span": "{opentelemetry-trace-span-key=RecordEventsReadableSpan{traceId=610399e3815dc59e58765f785125a19c, spanId=52817ef4208de551, parentSpanContext=ImmutableSpanContext{traceId=610399e3815dc59e58765f785125a19c, spanId=0877b939030ee172, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, name=FilteringWebHandler.handle, kind=INTERNAL, attributes=AttributesMap{data={thread.name=reactor-http-nio-4, thread.id=43}, capacity=128, totalAddedValues=2}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1627625955919426700, endEpochNanos=0}, opentelemetry-traces-server-span-key=RecordEventsReadableSpan{traceId=610399e3815dc59e58765f785125a19c, spanId=0877b939030ee172, parentSpanContext=ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}, name=HTTP POST, kind=SERVER, attributes=AttributesMap{data={net.peer.port=51264, schoolCode=415, http.method=POST, clientType=, http.user_agent=PostmanRuntime/7.28.2, platformVersion=, http.url=http://localhost:8080/v1/aggregate/user/info, platform=, studentId=7592518, http.client_ip=0:0:0:0:0:0:0:1, net.peer.ip=0:0:0:0:0:0:0:1, thread.name=reactor-http-nio-4, deviceName=, appVersion=, thread.id=43, http.flavor=1.1, appName=}, capacity=128, totalAddedValues=17}, status=ImmutableStatusData{statusCode=UNSET, description=}, totalRecordedEvents=0, totalRecordedLinks=0, startEpochNanos=1627625955919001800, endEpochNanos=0}}", "appName": "", "requestMethod": "POST", "reactor.onDiscard.local": "reactor.core.publisher.Operators$$Lambda$1192/593986789@222b5946", "requestUri": "/v1/aggregate/user/info", "deviceName": "", "platform": "", "studentId": "7592518", "hostname": "ZT-081201", "clientType": "", "platformVersion": "", "schoolCode": "415", "remoteAddr": "0:0:0:0:0:0:0:1", "trace_id": "610399e3815dc59e58765f785125a19c", "trace_flags": "01", "span_id": "0877b939030ee172" },

I don’t know where the usage has changed, or there is a bug here, any suggestions or code would be appreciated.

Performance Threshold breached AFTER Soak Tests completed for the (spark-awssdkv1, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit 1e5ce35 of the refs/heads/main branch for the (spark-awssdkv1, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

AWS Resource Attributes gathering

Is it possible for the AWS javaagent to capture AWS specific resource attributes like region, availability zone, ECS ARN, Container memory and CPU limits etc as per the OTel Cloud Semantic Conventions? At the moment, we're using the OTel Javaagent and using AOP to inject these attributes as span attributes but these are truly resource attributes and should be captured as such.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/cloud.md

Mismatch between trace ids reported by AWS ALB and application

Describe the bug
Im not quite sure if this ever worked with AWS ADOT but with the old Xray instrumentation it did.
We're running a quite simple and typical setup, an AWS ALB in front of a ECS Fargate task that runs our application.

We use the latest version of the AWS flavoured java agent (1.7) and version 0.13.0 of the AWS flavoured otel collector as a sidecar.
We turned on LB logs (s3) in order to have more insights into requests. It appears that not a single trace id mentioned in LB logs is visible within our application logs.

With the former Xray setup we could basically search for a trace id in our log frontend and would get both, the application logs and the corresponding LB log.

Apart from the java agent we only use the AwsXrayPropagator to inject the trace id back into our response. Something along these lines

public class XrayTraceIdHeaderFilter extends OncePerRequestFilter {
  @Override
  protected void doFilterInternal(
      HttpServletRequest request, HttpServletResponse response, FilterChain chain)
      throws ServletException, IOException {

    AwsXrayPropagator.getInstance()
        .inject(Context.current(), response, HttpServletResponse::setHeader);

    chain.doFilter(request, response);
  }
}

Is there something else I have to do in order to get this working again?

Additional notes:

  • app is a spring boot service; SB 2.5.5
  • java agent 1.7.0
  • otel collector 0.13.0

aws-otel-java-instrumentation : aws-xray not showing connected traces with kafka

Describe the bug
spring java applications have kafka implementation, one is producing message and other is consuming, running it with aws opentelemetry javaagent, collector is installed , traces are generated and I can see on aws-xray console but producer-consumer graph and traces are not connected.

When I printed headers I can see AWS-Amazon-Xray-Id header in both producer and consumer. Trace id value is same in this header but trace-id is different when printing it in logs.

e.g : TRACES from my application

9090] 2021-10-07 18:49:18 - c.n.n.k.i.KafkaProducerInterceptor - HEADERS IN PRODUCER**-----for TOPIC XXXX , {traceparent=00-615ef3d419ea6e5838f81274b50e4dde-8816e8faeacd8c8a-01, X-Amzn-Trace-Id=Root=1-615ef3d4-19ea6e5838f81274b50e4dde;Parent=8816e8faeacd8c8a;Sampled=1} trace_id=615ef3d419ea6e5838f81274b50e4dde span_id=8816e8faeacd8c8a trace_flags=01

[9090] 2021-10-07 18:49:18 - c.n.n.k.i.KafkaConsumerInterceptor - HEADERS IN CONSUMER-----for TOPIC XXXX , {nn-api-key-id=null, nn-timestamp=null, traceparent=00-615ef3d419ea6e5838f81274b50e4dde-8816e8faeacd8c8a-01, nn-app-name=null, nn-device-id=null, X-Amzn-Trace-Id=Root=1-615ef3d4-19ea6e5838f81274b50e4dde;Parent=8816e8faeacd8c8a;Sampled=1, nn-trans-id=9090} trace_id=615ef3d63e20955caaa62e7a3bd423e9 span_id=1633f0c818ad6cf1 trace_flags=01

Steps to reproduce
two microservices , getting request and sending message to kafka, other microservice is consuming the same message on same topic.

What did you expect to see?
from controller to producer and also consumer traces should be connected on aws-xray-console

What did you see instead?

  1. One set of traces for controller to producer
  2. then other set of traces with different trace-id

Additional context
I have tested in two sets of application, result is same not connecting:

  1. two microservices have simple kafka implementation
  2. Another two microservices have kafka and kafka streams implementation and also have Spring's DeferredResult

OpenTelemetry agent with version 1.4.x report metrics without values

After switching to the new opentelemetry agent in version: v 1.4.1 from 1.2.0 metrics exported to the new relic are always 0. I can see labels that are added there on my custom metrics and I can see agent reported metrics that are sent. Yet they have always value equal to 0.

Params that I use:
-javaagent:/...somePaths.../aws-opentelemetry-agent.jar -Dotel.javaagent.debug=true -Dotel.metrics.exporter=otlp -Dotel.instrumentation.runtime-metrics.enabled=true

And ENV Variables that I use:
OTEL_RESOURCE_ATTRIBUTES=service.name=someappName,service.namespace=somenamespace,service.instance.id=mymachine.local
OTEL_EXPORTER_OTLP_ENDPOINT=http://collectorIp:4317
OTEL_PROPAGATORS=xray

After replacing path to jar with agent in version 1.2.0 everything works just fine

Trace Id (x-amzn-trace-id) missing in Spring Boot HTTP response

Hello

Describe the bug
We used the old Xray SDK before we switched to otel-collector/javaagent this month. Right now we're missing the "x-amzn-trace-id" header in our API responses.

What did you expect to see?
x-amzn-trace-id should be present in http responses.

Additional context
v.0.6.0 sidecar and v0.12.1-aws.1 agent

Is this expected/work in progress or do I miss something?

Thanks

Repackage OpenTelemetry Java Agent to 1.4.0

Any estimation about when are we going to have the base otel agent updated?
currently, it's based on v1.2.0 which includes the out-of-memory issue.

That problem got solved in otel agent v1.3.0 (1) but the PR and the merge for upgrading the agent failed in here;

recently, v1.4.0 has been released, so looking forward to this one :)

Thanks in advance

Performance Threshold breached AFTER Soak Tests completed for the (springboot, auto) Sample App

Description

After the Soak Tests completed, a performance degradation was revealed for commit c258373 of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

The threshold violation should also be noticeable on our graph of Soak Test average results per commit.

How do I log ServiceLens/X-Ray "User" with the OTEL agent?

Is your feature request related to a problem? Please describe.
II am trying to log the end-user username to X-ray with with the OTEL agent using the attribute "enduser.id".
I've verified that the code works locally, and that the attribute is added.

The metadata does not contain the attribute when in X-Ray:

{
    "default": {
        "enduser.scope": "ROLE_USER USE_PREMIUM_TTS ROLE_STUDENT",
        "thread.name": "http-nio-8180-exec-21",
        "thread.id": ""
    }
}

Code:

public abstract class AbstractSpringOpenTelemetryInterceptor {

    // see io.opentelemetry.semconv.trace.attributes.SemanticAttributes
    public static final String ENDUSER_ID = "enduser.id";
    public static final String ENDUSER_SCOPE = "enduser.scope";

    public static String INSTRUMENTATION_APP_PACKAGE = "se.nomp";
    private final Tracer tracer = GlobalOpenTelemetry.getTracer(INSTRUMENTATION_APP_PACKAGE);

    protected Object wrapWithSpan(ProceedingJoinPoint pjp) throws Throwable {
        Span span = tracer.spanBuilder(generateSubsegmentName(pjp)).startSpan();
        try (Scope scope = span.makeCurrent()) {
            var principalName = getPrincipalName();
            if (principalName != null) {
                span.setAttribute(ENDUSER_ID, principalName);
                var authorities = getAuthorities();
                if (authorities != null) {
                    span.setAttribute(ENDUSER_SCOPE, authorities);
                }
            }
            return conditionalProceed(pjp);
        } finally {
            span.end();
        }
    }

    protected String generateSubsegmentName(ProceedingJoinPoint pjp) {
        return pjp.getSignature().getDeclaringType().getSimpleName() + "." + pjp.getSignature().getName();
    }

    private static Object conditionalProceed(ProceedingJoinPoint pjp) throws Throwable {
        return pjp.getArgs().length == 0 ? pjp.proceed() : pjp.proceed(pjp.getArgs());
    }

    public static String getPrincipalName() {
        Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
        if (authentication != null) {
            return authentication.getName();
        }
        return null;
    }

    public static String getAuthorities() {
        Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
        if (authentication != null) {
            var authorities = authentication.getAuthorities();
            if (authorities != null) {
                return authorities.stream().map(a -> a.getAuthority()).collect(Collectors.joining(" "));
            }
        }
        return null;
    }
}

Describe the solution you'd like
I'd like to know what I should be doing instead.

Flaky smoke test

Describe the bug
Smoke test is failing intermittently

Steps to reproduce

What did you expect to see?
A clear and concise description of what you expected to see.

What did you see instead?

SpringBootSmokeTest > hello() FAILED
    java.lang.AssertionError: 
    Expecting any element of:
      <[trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "\001\005\330\037\367\231u\237"
    parent_span_id: "\"?\035@\372S\3259"
    name: "AppController.backend"
    kind: SPAN_KIND_INTERNAL
    start_time_unix_nano: 1607097575118242100
    end_time_unix_nano: 1607097575155671900
    attributes {
      key: "thread.id"
      value {
        int_value: 24
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-4"
      }
    }
    status {
    }
    ,
        trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "5e\347F\240S\345}"
    parent_span_id: "\367\275\001\230\224\355P\201"
    name: "AppController.hello"
    kind: SPAN_KIND_INTERNAL
    start_time_unix_nano: 1607097575004034200
    end_time_unix_nano: 1607097575194917200
    attributes {
      key: "thread.id"
      value {
        int_value: 23
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-3"
      }
    }
    status {
    }
    ,
        trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "\'\354\243\3516\032\267\212"
    parent_span_id: "5e\347F\240S\345}"
    name: "HTTP GET"
    kind: SPAN_KIND_CLIENT
    start_time_unix_nano: 1607097575093631200
    end_time_unix_nano: 1607097575180852600
    attributes {
      key: "thread.id"
      value {
        int_value: 23
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-3"
      }
    }
    attributes {
      key: "net.transport"
      value {
        string_value: "IP.TCP"
      }
    }
    attributes {
      key: "http.method"
      value {
        string_value: "GET"
      }
    }
    attributes {
      key: "http.flavor"
      value {
        string_value: "1.1"
      }
    }
    attributes {
      key: "net.peer.name"
      value {
        string_value: "localhost"
      }
    }
    attributes {
      key: "net.peer.port"
      value {
        int_value: 8080
      }
    }
    attributes {
      key: "http.url"
      value {
        string_value: "http://localhost:8080/backend"
      }
    }
    attributes {
      key: "http.status_code"
      value {
        int_value: 200
      }
    }
    status {
    }
    ,
        trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "\"?\035@\372S\3259"
    parent_span_id: "\'\354\243\3516\032\267\212"
    name: "/backend"
    kind: SPAN_KIND_SERVER
    start_time_unix_nano: 1607097575117008100
    end_time_unix_nano: 1607097575160650900
    attributes {
      key: "thread.id"
      value {
        int_value: 24
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-4"
      }
    }
    attributes {
      key: "net.peer.ip"
      value {
        string_value: "127.0.0.1"
      }
    }
    attributes {
      key: "net.peer.port"
      value {
        int_value: 47104
      }
    }
    attributes {
      key: "http.method"
      value {
        string_value: "GET"
      }
    }
    attributes {
      key: "http.user_agent"
      value {
        string_value: "Java/11.0.8"
      }
    }
    attributes {
      key: "http.url"
      value {
        string_value: "http://localhost:8080/backend"
      }
    }
    attributes {
      key: "http.flavor"
      value {
        string_value: "HTTP/1.1"
      }
    }
    attributes {
      key: "http.client_ip"
      value {
        string_value: "127.0.0.1"
      }
    }
    attributes {
      key: "http.status_code"
      value {
        int_value: 200
      }
    }
    status {
    }
    ]>
    to satisfy the given assertions requirements but none did:

      <trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "\001\005\330\037\367\231u\237"
    parent_span_id: "\"?\035@\372S\3259"
    name: "AppController.backend"
    kind: SPAN_KIND_INTERNAL
    start_time_unix_nano: 1607097575118242100
    end_time_unix_nano: 1607097575155671900
    attributes {
      key: "thread.id"
      value {
        int_value: 24
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-4"
      }
    }
    status {
    }
    > error: 
    Expecting:
     <SPAN_KIND_INTERNAL>
    to be equal to:
     <SPAN_KIND_SERVER>
    but was not.

      <trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "5e\347F\240S\345}"
    parent_span_id: "\367\275\001\230\224\355P\201"
    name: "AppController.hello"
    kind: SPAN_KIND_INTERNAL
    start_time_unix_nano: 1607097575004034200
    end_time_unix_nano: 1607097575194917200
    attributes {
      key: "thread.id"
      value {
        int_value: 23
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-3"
      }
    }
    status {
    }
    > error: 
    Expecting:
     <SPAN_KIND_INTERNAL>
    to be equal to:
     <SPAN_KIND_SERVER>
    but was not.

      <trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "\'\354\243\3516\032\267\212"
    parent_span_id: "5e\347F\240S\345}"
    name: "HTTP GET"
    kind: SPAN_KIND_CLIENT
    start_time_unix_nano: 1607097575093631200
    end_time_unix_nano: 1607097575180852600
    attributes {
      key: "thread.id"
      value {
        int_value: 23
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-3"
      }
    }
    attributes {
      key: "net.transport"
      value {
        string_value: "IP.TCP"
      }
    }
    attributes {
      key: "http.method"
      value {
        string_value: "GET"
      }
    }
    attributes {
      key: "http.flavor"
      value {
        string_value: "1.1"
      }
    }
    attributes {
      key: "net.peer.name"
      value {
        string_value: "localhost"
      }
    }
    attributes {
      key: "net.peer.port"
      value {
        int_value: 8080
      }
    }
    attributes {
      key: "http.url"
      value {
        string_value: "http://localhost:8080/backend"
      }
    }
    attributes {
      key: "http.status_code"
      value {
        int_value: 200
      }
    }
    status {
    }
    > error: 
    Expecting:
     <SPAN_KIND_CLIENT>
    to be equal to:
     <SPAN_KIND_SERVER>
    but was not.

      <trace_id: "_\312\\\346M\357;\244\266z\310\0054\332\233\307"
    span_id: "\"?\035@\372S\3259"
    parent_span_id: "\'\354\243\3516\032\267\212"
    name: "/backend"
    kind: SPAN_KIND_SERVER
    start_time_unix_nano: 1607097575117008100
    end_time_unix_nano: 1607097575160650900
    attributes {
      key: "thread.id"
      value {
        int_value: 24
      }
    }
    attributes {
      key: "thread.name"
      value {
        string_value: "http-nio-8080-exec-4"
      }
    }
    attributes {
      key: "net.peer.ip"
      value {
        string_value: "127.0.0.1"
      }
    }
    attributes {
      key: "net.peer.port"
      value {
        int_value: 47104
      }
    }
    attributes {
      key: "http.method"
      value {
        string_value: "GET"
      }
    }
    attributes {
      key: "http.user_agent"
      value {
        string_value: "Java/11.0.8"
      }
    }
    attributes {
      key: "http.url"
      value {
        string_value: "http://localhost:8080/backend"
      }
    }
    attributes {
      key: "http.flavor"
      value {
        string_value: "HTTP/1.1"
      }
    }
    attributes {
      key: "http.client_ip"
      value {
        string_value: "127.0.0.1"
      }
    }
    attributes {
      key: "http.status_code"
      value {
        int_value: 200
      }
    }
    status {
    }
    > error: 
    Expecting:
     <"/backend">
    to be equal to:
     <"/hello">
    but was not.
        at io.awsobservability.instrumentation.smoketests.runner.SpringBootSmokeTest.hello(SpringBootSmokeTest.java:154)


Additional context
Add any other context about the problem here.

Null Pointer Exception with latest release

We updated to the latest release (0.17) of the aws-otel-java-instrumentation (from 0.15 previously) and now we see exceptions on startup and traces are not reported.

ERROR io.opentelemetry.javaagent.OpenTelemetryAgent
java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at io.opentelemetry.javaagent.OpenTelemetryAgent.agentmain(OpenTelemetryAgent.java:64)
at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.agentmain(AwsAgentBootstrap.java:28)
at software.amazon.opentelemetry.javaagent.bootstrap.AwsAgentBootstrap.premain(AwsAgentBootstrap.java:24)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(Unknown Source)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer.startAgent(AgentInitializer.java:44)
at io.opentelemetry.javaagent.bootstrap.AgentInitializer.initialize(AgentInitializer.java:30)
... 13 more
Caused by: java.lang.ExceptionInInitializerError
at io.opentelemetry.sdk.extension.aws.resource.Ec2ResourceProvider.createResource(Ec2ResourceProvider.java:16)
at io.opentelemetry.sdk.autoconfigure.OpenTelemetrySdkAutoConfiguration.buildResource(OpenTelemetrySdkAutoConfiguration.java:77)
at io.opentelemetry.sdk.autoconfigure.OpenTelemetrySdkAutoConfiguration.(OpenTelemetrySdkAutoConfiguration.java:25)
at io.opentelemetry.javaagent.tooling.OpenTelemetryInstaller.installAgentTracer(OpenTelemetryInstaller.java:36)
at io.opentelemetry.javaagent.tooling.OpenTelemetryInstaller.beforeByteBuddyAgent(OpenTelemetryInstaller.java:27)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installComponentsBeforeByteBuddy(AgentInstaller.java:168)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:102)
at io.opentelemetry.javaagent.tooling.AgentInstaller.installBytebuddyAgent(AgentInstaller.java:86)
... 19 more
Caused by: java.lang.NullPointerException
at io.opentelemetry.sdk.extension.aws.resource.Ec2Resource.buildResource(Ec2Resource.java:83)
at io.opentelemetry.sdk.extension.aws.resource.Ec2Resource.buildResource(Ec2Resource.java:49)
at io.opentelemetry.sdk.extension.aws.resource.Ec2Resource.(Ec2Resource.java:31)
... 27 more

Additional info:
We deploy to an EC2 backed ECS cluster with the aws otel collector running as a sidecar to each service.

aws-opentelemetry-agent.jar with Springboot Admin console

Hello @mxiamxia and AWS team -
When using OTEL with Spring boot app, spring boot admin console wraps all actual classes with io.openteletry.javaagent.instrumentation.spring.scheduling.SpringSchedulingRunnablewrapper. I would like to see the actual classes and not have the above wrapper obfuscate the actual classes.

Sample picture is attached. How to fix this?
Uploading Screen Shot 2021-06-23 at 7.33.31 AM.png…

Configurable retries on Transient Collector unavailability

Is your feature request related to a problem? Please describe.
We run the aws-otel-java-instrumentation agent to export telemetry data to the collector that runs as an ECS daemon service on an EC2 host that's part of an ECS cluster. When doing a deployment, the collector daemon becomes temporarily unavailable as it needs to stop the existing task in order to run the new version of the collector task. During this time, telemetry data might be lost when trying to export it to the collector.

Is there a way to configure the agent such that it can retry until the collector becomes available? If not, is there a workaround for this? Also note, we're using the otlp exporter.

Describe the solution you'd like
The agent should be configurable in a way that allows for retrying exporting telemetry data during transient unavailability of the collector.

Performance Threshold breached DURING Soak Tests execution for the (springboot, auto) Sample App

Description

During Soak Tests execution, a performance degradation was revealed for commit d1b010b of the refs/heads/main branch for the (springboot, auto) Sample App. Check out the Action Logs from the Soak Testing workflow run on GitHub to view the threshold violation.

Useful Links

Snapshots of the Soak Test run are available on the gh-pages branch. These are the snapshots for the violating commit:

CPU Load Soak Test SnapShot Image
Total Memory Soak Test SnapShot Image

Unable to see OpenTelemetry Span Events in AWS X-Ray

Describe the bug
When trying to add a Span Event to a span using the span.current.addEvent() method in OpenTelemetry, the event is not visible in AWS X-Ray. We can see our custom spans but none of the Events.
Steps to reproduce
We are using a Java SpringBoot application using the spring-boot-starter-parent version 2.4.5. We are using the opentelemetry-otlp-exporter-starter 1.4.1-alpha dependency.

We created a docker image for our application that adds the aws-opentelemetry-agent.jar (v1.4.1) into the image when we build it. We then run as a javaagent when the image is deployed. We deploy our application to ECS Fargate and we then run the aws-otel-collector as a sidecar with the default configuration.

What did you expect to see?
We expected to see our span events either as annotations or metadata in our spans.

What did you see instead?
We just saw our custom spans without any Span Events.

Additional context
We are also experiencing the same behaviour when we use Elastic as the backend for observability. This is due to this issue

Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch

Describe the bug
Instrumenting java agent for application interacting with S3 using aws-java-sdk-s3 causes SignatureDoesNotMatch error.

Steps to reproduce
Instrument application interacting with S3 using aws-java-sdk-s3 library

com.amazonaws aws-java-sdk-s3 1.11.444

com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch;

Version of Java agent
0.17.0-aws.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.