Coder Social home page Coder Social logo

Comments (9)

rapphil avatar rapphil commented on August 16, 2024 3

Thank you everyone for reporting and investigating this issue, in special @asmadsen for providing a fix.

The fix will be available in the next minor release (v1.29.0) of the ADOT Java Agent.

from aws-otel-java-instrumentation.

rapphil avatar rapphil commented on August 16, 2024 1

Release v1.29.0 is available for usage.

from aws-otel-java-instrumentation.

alex-arana avatar alex-arana commented on August 16, 2024

We experienced the same issue in our environments with corrupted trace-ids after upgrading AWS OpenTelemetry Java Agent from v1.20.0 to v1.23.0 sometime last April.

After reverting to v1.20.0, everything started working normally again. We also tried v1.26.0 released mid-May with equally unsuccessful results.

It would seem some changes were introduced that affected propagation of trace-ids sometime after v1.20.0.

from aws-otel-java-instrumentation.

felixscheinost avatar felixscheinost commented on August 16, 2024

That's good to hear, because that makes it less likely that we are doing something stupid which causes this 😉

I just tested and the problem persists with v1.27.0.

from aws-otel-java-instrumentation.

AlexKleiber avatar AlexKleiber commented on August 16, 2024

I can confirm the problem with v1.26.0 and v1.27.0.
We also discovered that not every log entry contains the attribute "AWS-XRAY-TRACE-ID"

from aws-otel-java-instrumentation.

srprash avatar srprash commented on August 16, 2024

@felixscheinost To understand this further, I have a few questions

The reason for the failure is evident when viewing the log events in CloudWatch: the AWS-XRAY-TRACE-ID and the trace_id fields of the logged JSON objects are completely different (not just the usual 1- and inserted -).

Do you mean, that in your CW logs, you see the trace id as xxxxxxx instead of the expected xray formatted 1-xxx-xxxxx? Or are they entirely different?

On the other hand there is also one log stream where for the first 15 minutes the two IDs match (modulo the 1- and -) but suddenly they break and seem to stay broken for all further logs.
Do you mean that initially the trace ids differ only by the 1- and - and then afterwards they differ completely in their values?

Can you provide an example of the trace id that you see in your trace and the one logged in CW logs?
Are you seeing this in your root service (that generates a trace) or is it receiving the trace context from some upstream service?

from aws-otel-java-instrumentation.

alex-arana avatar alex-arana commented on August 16, 2024

@felixscheinost To understand this further, I have a few questions
Do you mean, that in your CW logs, you see the trace id as xxxxxxx instead of the expected xray formatted 1-xxx-xxxxx? Or are they entirely different?

We started testing v1.28.0 and have come to the following conclusion: AWS-XRAY-TRACE-ID still does not reliably match trace_id in the MDC context. We are using logback as per ADOT Logging Guide and configured PatternLayout to be '%-5p %X{AWS-XRAY-TRACE-ID} %X' so that we can see both MDC key-value pairs side-by-side:

2023-07-25T10:26:12.977+10:00 DEBUG 1-64bf1632-122b76764ff0867839928046@ea4d0bbd5a946345 AWS-XRAY-TRACE-ID=1-64bf1632-122b76764ff0867839928046@ea4d0bbd5a946345, trace_id=64bf16a47e128818703e67e9253173ac, trace_flags=01, span_id=ec9a94155c77f5fb 1 --- [     parallel-2] c.b.i.commons.logging.LoggingWebFilter   : Handling HTTP request: GET http://mytest-dev-ecs-nlb-5487598345723.elb.ap-southeast-2.amazonaws.com/api/v1/test/data

2023-07-25T10:26:12.982+10:00 INFO  1-64bf1632-122b76764ff0867839928046@ea4d0bbd5a946345 AWS-XRAY-TRACE-ID=1-64bf1632-122b76764ff0867839928046@ea4d0bbd5a946345, trace_id=64bf16a47e128818703e67e9253173ac, trace_flags=01, span_id=d73e4a75dc689b8b 1 --- [     parallel-2] c.b.i.test.controller.TestControllerV1   : Received a call to fetch Tenant data..

2023-07-25T10:26:12.998+10:00 DEBUG 1-64bf1632-122b76764ff0867839928046@e2639b97695436b0 AWS-XRAY-TRACE-ID=1-64bf1632-122b76764ff0867839928046@e2639b97695436b0, trace_id=64bf16a47e128818703e67e9253173ac, trace_flags=01, span_id=b4daf8f1ed65136d 1 --- [ctor-http-nio-2] c.b.i.commons.logging.WebClientLogger    : HTTP call: GET http://servicenow-adapter-service.internal.dev.acme.com:8080/api/v2/tenants Response: statusCode=200 OK contentLength=0 timeTaken=14ms

2023-07-25T10:26:13.226+10:00 DEBUG  trace_id=64bf16a47e128818703e67e9253173ac, trace_flags=01, span_id=ec9a94155c77f5fb 1 --- [ctor-http-nio-1] c.b.i.commons.logging.LoggingWebFilter   : Handled HTTP request: GET http://mytest-dev-ecs-nlb-5487598345723.elb.ap-southeast-2.amazonaws.com/api/v1/test/data statusCode=200 OK

Can you provide an example of the trace id that you see in your trace and the one logged in CW logs? Are you seeing this in your root service (that generates a trace) or is it receiving the trace context from some upstream service?

In the logging output above, we can see that the value for AWS-XRAY-TRACE-ID: 1-64bf1632-122b76764ff0867839928046 does not match that of trace_id: 64bf16a47e128818703e67e9253173ac and span_id: d73e4a75dc689b8b.

We can also see that the values for trace_id/span_id match that of the parent context upstream and the trace is rendered properly in the AWS XRay Traces UI.

from aws-otel-java-instrumentation.

felixscheinost avatar felixscheinost commented on August 16, 2024

Sorry, for the late reply. I wanted to prepare a basic example project to reproduce the issue but didn't find time to do that until now.

Seems like there is already progress on fixing the issue, thanks @asmadsen

I'll provide the link to the example project nevertheless: https://github.com/felixscheinost/test-aws-otel-java-agent

Here is the log output with v1.20.0: https://gist.github.com/felixscheinost/9b1f5d601da916cab1adba3743404552

And here with v1.28.0: https://gist.github.com/felixscheinost/e25e256a780ffbe1879dfff929d69885

In those logs there is easily visible what @asmadsen figured out:

First request, handled on thread exec-1 has trace ID 1-64c12479-9781b7613289b8b04750c6da@80025d3e95138300. The next request which is handled on the same thread has the same trace ID.

For the first time a request is handled on another thread the AWS-XRAY-TRACE-ID is empty, afterwards it stays the same for all subsequent requests on that thread.

from aws-otel-java-instrumentation.

rapphil avatar rapphil commented on August 16, 2024

Closing the issue

from aws-otel-java-instrumentation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.