open-telemetry / opentelemetry-go Goto Github PK

View Code? Open in Web Editor NEW

4.8K 66.0 983.0 17.72 MB

OpenTelemetry Go API and SDK

Home Page: https://opentelemetry.io/

License: Apache License 2.0

Go 99.29% Makefile 0.37% Shell 0.10% Dockerfile 0.01% Jinja 0.24%

tracing metrics opentelemetry

opentelemetry-go's People

Contributors

Stargazers

Watchers

Forkers

fuweid singhabhilash jranson jmacd rghetia sjkaris heroku etsangsplk pjanotti iredelmeier mgdevstack hungaikev djosephsen udayreddy bgguna purple4reina ihuangyaoshi backwardn kinvolk agneum zchee cloud-land isgasho c24t ymotongpoo stgrzeszczak aolmo97 vineethreddy02 julianocristian kdef-nr shreyassrivatsan forkkit codenio campos neethu2001 17072001musafir sivanjana clsung freeformz matej-g zrobisho gtrevg hiepndd lamvanthai1 bogdandrutu erikauer brbussy tukejonny ferhatelmas javi-farach belugla evantorrie marwan-at-work davidmontoyago schigh vvydier ekkarat-w-gmail-com jazzdan loginakhil ptravers nvcexploder amber-lamp ryan-git takekazuomi aiderzcx sircelsius huikang clippit mralias lmineiro nicktrav johananl rebeccajae piotrkowalczuk therealpuneeth20 amrmahdi krnowak shabicheng aneurysm9 wingyplus andrewhsu smnalex neetle 16yuki0702 jdefrank rizkybiz nilebox mikegoldsmith sergeybataev alexandre-normand codeboten ilyakaznacheev liguozhong nmnellis oncilla volkerk owais aopat regiocom zlozano

opentelemetry-go's Issues

Eliminate tag.Map API

The Map API is unsafe, as explained by @cstockton in #23 (comment)

Make it a concrete type.

My expectation when I started looking at the code base was that I would find mostly concrete implementations of the basic data types (Traces, Spans, SpanContexts, Logs, Measurements, Metrics, etc) with interfaces for vendors to hook into for enriching that data and exporting it either in batch or streaming forms.

Looking through the code base I'm sort of astonished by the number of interfaces that exist over concrete data types.

I realize that vendors want some control over various aspects of the implementation, but starting with making everything an interface seems like a poor design choice to me.

Can anyone explain this or point me to the docs that explain the motivation behind this?

API: Add Tracestate in SpanContext

SpanContext spec

Loading the Go SDK

As discussed briefly in the 7/9/2019 OpenTelemetry specification SIG meeting, the current specification does not say exactly how to initialize the SDK, particularly, the standard "global" tracer, meter, and recorder interfaces. It's not clear whether this requires a note in the cross-language specification, but I think we can state requirements for Go here. This is a proposal we might want to adopt:

The API must provide getters and setters for global Tracer, Meter, and Recorder interfaces, which comprise the standard SDK, these are expected to be widely used by default
To satisfy "Zero-Touch telemetry requirements", the application is not required to call an initializer to install or start the default SDK.
Where the language provides dynamic code loading, the API is required to support loading a SDK dynamically based on a language-appropriate external configuration (e.g., environment variables).
The API is required to support manual installation of a specific SDK, via a direct dependency.

Given these (proposed) requirements, here is an implementation for Go that works. It requires moving the global setter/getter methods into dedicated packages to avoid an import cycle--it is those global packages that trigger auto-loading the SDK.

jmacd#1

Remove Span.ModifyAttributes()

Supporting this kind of method requires holding onto the span in memory and prevents streaming implementation. There is no such method in the Java API.

[experimental] observer pattern should not panic when Finish() called multiple times

while it's undesirable to call Finish() more than once, we should probably be more informative than panicking when it happens...

https://gist.github.com/lizthegrey/30ca5c7def588a1bfd3c33041db67d27

`go get -u` depending upon otel-go fails

$ go get -u
github.com/open-telemetry/[email protected]: parsing go.mod: unexpected module path "go.opentelemetry.io"

is this a breaking change that I missed at some point, or is it a more structural problem?

add unit tests to tracer api

per issue #51, creating an issue for tracer and considering it assigned to me

Separate API from SDK in Go repo

The current prototype that was merged into this repository was unfinished work, in part because it was created while the OpenTelemetry specification was not finished.

The prototype included several concepts at the API level that were dedicated to streaming events to an asynchronous or out-of-process observer as part of the PoC. These types, in particular "ScopeID" are not needed for an in-process span and metrics exporter. To complete this work, the API should have a pure interface that an SDK can provide, for example:

https://github.com/tigrannajaryan/otelapi-compat/

Unit tests for API: Core

per #51 , adding this issue and assigning to me along with a PR with some coverage (#61)

Reconsider use of singletons

Hi,

Many people in the Go community disagree with singletons: i.e. Package global state. A few people have spoken about them in the past, including this post by Dave Cheney and this post by Peter Bourgon, this talk at gophercon 2016, and this medium post about Aspects of a good Go library.

It is possible to implement the open-telemetry API without package global state. While singletons are clever ways to avoid explicit dependencies, they make the dependencies of an API less clear by hiding dependencies inside package private code rather than parameters. Clear is better than clever.

Singletons create API lock in by disallowing users from injecting their own solutions to dependent libraries if they decide to move away from open-telemetry in the future. More about this is talked about in the post What “accept interfaces, return structs” means in Go

Singletons are a viral API design choice. If a library I depend upon uses Singletons, then I'm forced into their singletons. On the other hand, if a library does not use Singletons and I want to make it Singletons I can. Viral APIs are generally discouraged as they limit the ability to refactor code in the long term.

Singletons make it difficult to piecemeal refactor code by pushing all dependent library code along a single code path. Incremental API change is important for large code bases and was the driving factor for Go's alias proposal.

It is possible to implement the open telemetry described API without singletons. I strongly hope this position can be reconsidered or discussed.

Thanks!

SDK: Add grpc plugin

Create grpc plugin for opentelemetry, similar to ocgrpc for opencensus

SDK: Add http plugin

Create http plugin for opentelemetry, similar to ochttp plugin for opencensus

API: Add Link interface for Span

Add Link spec

SpanProcessor API for building export pipelines

See the specification: open-telemetry/opentelemetry-specification#205

This can be refactored out of the existing code in the sdk/trace directory.

Add circle-ci build

Clarify how AWS CloudWatch integrates with opentelemetry-go

Hi,

I tried writing a cloudwatch exporter for open census and the API there is not rich enough to do CloudWatch metrics well. I document some of the reasons on the README.md at https://github.com/cep21/cwopencensusexporter

I see open telemetry is based upon open census. Will it have the same problems, or will it work well with CloudWatch?

I've looked over a few metrics APIs and it seems like segment.io's handler is expressive enough for cloudwatch metrics because it lets me handle measures higher up the API stack at https://github.com/segmentio/stats/blob/master/handler.go#L15. Will open telemetry allow the same?

Thanks!

opentelemetry-go need to use uniform package layout?

here: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/package-layout.md

Consider separating out git repository for tracing from metrics

Hi,

Tracing, logging, and metrics all help observability, but they are very different things. Companies that do one, may not do another. Systems that solve one well look very different from systems that solve another well. There is no clear reason to have them all in one repository, but very compelling reasons to separate them.

API differences

The API differences between metrics are tracing are very large. What little they share can be similarly solved with copying. A little copying is better than a little dependency.

Repository size and signal to customers

People in the past have complained about the size of this repository's public API. It is much bigger than similar repositories that do any of those individually. This size carries a signal of bloat that is antithesis to Go's best practices around smaller libraries that focus and do one thing well. It sends a bad signal to people wanting to evaluate a library's focus.

Semantic versioning

The standard way to do semantic versioning in Go is modules. Modules work best with git tags to represent semantic versions. Because of the differences between metrics and tracing, it is very likely that a change in the metrics API may not require a change in the tracing API. By putting both in the same repository you potentially force metrics users to do a major version change when tracing updates.

Potential for flexibility

Metrics and logging are very old observability patterns that have had time to mature into a somewhat consistent API. Tracing is much newer and the thoughts around what a good tracing API should look like are currently fragmented. It is possible that opentelemetry may land on a good solution for metric standardization while failing to make the best possible tracing API. Or the other way around, doing tracing very well while failing at some aspect of metrics. By separating these repositories people do not have to feel all in on both or neither.

Better API design

A well written metrics abstraction should be able to interact with a well written tracing abstraction and a well written logging abstraction. Being separate, interchangeable, and working together is proof the API is flexibly designed.

Lazy vs non-Lazy Event APIs

The API currently has two forms of event API, one supporting non-lazy usage (Event) and one supporting lazy usage (AddEvent). On the one hand, there is a desire to support lazy events. On the other hand, there is a desire not to force allocation of an object for all events.

This may rise to a specification level issue. We have open discussions around SpanData and Sampling and Laziness in general, and it is difficult to separate the event API from this larger discussion.

In OpenTracing, there was support for lazy events, via bulk delivery using an option to the span.Finish() call. I believe the thinking was that sometimes you want to be lazy and limit the number of events that are buffered (assuming they are buffered). See https://github.com/opentracing/opentracing-go/blob/135aa78c6f95b4a199daf2f0470d231136cbbd0c/span.go#L154

At the same time, some OpenTracing libraries added support for lazy values, which are not the same as lazy events. In this case, I believe the thinking was that a programmer is only concerned with avoiding an expensive serialization when the SDK is not recording a span. See https://github.com/opentracing/opentracing-go/blob/135aa78c6f95b4a199daf2f0470d231136cbbd0c/log/field.go#L149

I will propose that we:

Eliminate the lazy/interface form "AddEvent", keep the non-lazy/concrete form "Event". Either name will do, it's the behavior that counts.
Support bulk lazy events via FinishOptions as in OpenTracing
Support lazy values via a new ValueType for lazy serialization

Starting points for Go SDK

I've been following the developing specification and work on the Java SDK. I felt that there were several considerations that applied uniquely in Go, particularly because of its builtin context.Context, that deserved prototyping. I've just posted a new repository containing this prototype:

https://github.com/lightstep/opentelemetry-golang-prototype

This is not a complete implementation of the OpenTracing and OpenCensus API surface areas. I'm posting this here, now, to have as a point of reference for several of the issues in the specification repo. Some of the high-lights of the approach taken here:

Always associate current span context with stats and metrics events
Introduce a low-level "observer" exporter
Avoid excessive memory allocations
Avoid buffering state with API objects
Use context.Context to propagate context tags and active scope
Introduce a "reader" implementation to interpret "observer"-exported events and build state
Use a common KeyValue construct for span attributes, context tags, resource definitions, log fields, and metric fields
Support logging API w/o a current span context
Support for golang plugin package to load implementations
Example use of golang's net/http/httptrace w/ @iredelmeier's tracecontext.go package

The first bullet about associating current span context and stats/metrics events bridges the tracing data model with the metrics data model. The APIs here would make this association not an option, as the stats.Record API takes a context, which passes through to the observer, which could choose to use the span-context association. The prototype includes a stderr exporter that writes a debugging log of events to the console. One of the critical features here, enabled by the low-level observer exporter, is that Span start events can be logged in chronological order, not as the span finishes.

How to become a reviewer / get involved?

I've read the documentation at https://github.com/open-telemetry/community/blob/master/community-membership.md - I do not see the path to becoming a reviewer- only how to become an approver (by being a reviewer for at least 3 months) or how the current owners were selected. I would like to be involved in the initial design and implementation of this project in whatever ways I can, how do I do that?

Access to key value pair in tag.Map needs to be thread-safe

When calling Foreach() on the tag.Map, we gain access to every key and value in the map. We can also use Value() to get access to a value for a specific key. The keys and values returned should essentially be copies of whatever is contained in the map, and usually they are. There are some troublesome members though. Some of them have the pointer-like semantics, so it is possible to mutate them in the Foreach callback or just in the returned object from Value(), and thus mutate the contents of the map itself.

Currently that happens for the keys in the map that contain the BYTES Value. So the map implementation should make a copy of the bytes when giving the access to it.

I'm also wondering if the Type field in registry.Variable should also be copied, because it is an interface, so it also could have some pointer-like semantics when doing shallow copy during Foreach.

API: Add Sampler Interface

Sampler spec

Add Unit Test to APIs

This issue is opened to add unit-test to following packages..

Contributors : For each package open an issue and assign yourself before you work on it.

API: Add Tracestate to SpanContext

As per the spec add tracestate to SpanContext.

API: In-process context propagation

Prometheus metrics exporter

We'll port the OpenCensus metricsdata exporter code into this repository.

Add makefile to do gofmt, golint, and go vet.

SDK: Tracestate propagation

Implement tracestate propagation.

blocked on #74

http plug-in problem

I tried to run the http instance code，and use wireshark to capture network messages,why didn't i find traceID spanID in the http message?

SpanData exporter implementation: Zipkin

The task is to create a SpanData exporter for exporting Zipkin

Shim: OpenCensus Shim

Create an OpenCensus shim that bridges opencensus to opentelemetry.

OpenTracing shim needed

We need a shim to translate OpenTracing API calls into OpenTelemetry API calls.
Concretely, this is an OpenTracing implementation based on the OTel SDK.
This will require more issues to be filed as this task is broken down.

Add documentation for APIs

Documentation is missing/insufficient in many apis. This issue is opened to add the documentation.

Contributors : For each package open an issue and assign yourself before you work on it.

Circle CI Restoring/Saving cache doesn't seem to be working.

If it was I would expect that most, if not all of those packages, wouldn't be fetched.

SDK: Create separate package for exporter interface.

as per this comment.

Exporter interface is currently part of SDK. Ideally it would be nice to have a separate interface that can be reused by other SDKS.

Requirement

Define Export Data Types and corresponding exporter interface. E.g

type ExporterType int32

const (
    ExporterTypeSync = iota
    ExporterTypeAsyncBatched,
    // For streaming SDKs
    ExporterTypeSyncEvents,
)

Define Export Interface for each Data Types. E.g

type SyncExporter interface {
    // implementation cannot block.
    ExportSpan(ctx context.Context, span SpanData)
}
type AsyncBatchedExporter interface {
    ExportSpans(ctx context.Context, spans []SpanData)
}
type SyncEventsExporter interface {
    // implementation cannot block.
    ExportSpanEvent(ctx context.Context, event SpanEvent) // These are streaming SDK events.
}

Registry for Exporter

func Register(t ExporterType, e interface{}) {
}

API: Inject/Extract

Metrics Processor

Similar to Span Processor, add metric processor interface for processing metrics.
Provide two metric processor

Simple Processor
Batched Processor

Issues with current design

First will there be an official process to make appeals against the projects design decisions? I am worried that collectively open-telemetry is implementing the former projects designs under what I believe to be a false premise it can "always change things later". I feel such changes will be impossible once the library is in wide spread use.

I have a number of concerns with the actual implementation here but it will be a large effort for me to enumerate them. For now I would like to start with some higher level issues that would resolve many of my code related concerns.

Exporters should not exist in this project (fundamental issue with OpenTelemetry as a whole)

If I could choose a single issue to focus on this would be it, as I feel it creates most of the design issues I have to begin with. This is an issue with the greater OpenTelemetry project as a hole. As far as "openness" OpenTelemetry is a step backwards from OpenTracing. It's placing OpenTelemetry as a gatekeeper of implementations, in fact it defines no formal specifications at all which I reported an issue about open-telemetry/opentelemetry-specification#20 and discuss in this gist. Instead of regurgitating these points in detail here I would ask anyone interested to please read this conversation, to summarize:

OpenTracing - Each SDK provides an interface in opentracing- which trace providers may implement in their own repositories to ship traces, for example

Python API defined in https://github.com/opentracing/opentracing-python and each trace provider must implement this:
- https://github.com/lightstep/lightstep-tracer-python
- https://github.com/jaegertracing/jaeger-client-python
- ... and on it goes for anyone who wants to provide tracing
Go API defined in https://github.com/opentracing/opentracing-go and each trace provider must implement this:
- https://github.com/lightstep/lightstep-tracer-go
- https://github.com/jaegertracing/jaeger
- ... and on it goes for anyone who wants to provide tracing

For tracing to be useful you must have participation from many teams across your organization. It's not uncommon for a request to span many languages (java, C#, Go, Python, Nodejs, ...) at a medium sized company. Thus if you want to provide a new tracer and integrate well with an existing ecosystem you must commit the dev cycles to implement all of these languages. There really is no reason to continue on this path with the "next generation" convergence efforts of "OpenTelemetry" that will leave a footprint on the ecosystem for years.

OpenTelemetry - Each SDK provides no interface as a "standard" design principle (not that I've seen anything resembling a standard design principle as I've echo'd here, in gitter, etc). This project seems to provide an ABI only, which is a step backwards from the API of OpenTracing. In addition it seems there is a drive to have the default OpenTelemetry SDK's all talk to a central "OpenTelemetry" agent which will be bundled with exporters further diminishing "openness". Push for an agent after your SDK's reduces the per-language burden we see above with OpenTracing but inserts OpenTelemetry as the gatekeeper of tracing implementations. At the very worst the "maintainers / governing body / process" could be captured by commerical interests to impede competition. At best it slowly grows to be a more arduous and painful process as layers of "exporters" increase bloat, complexity and reduce maintainability until it's no longer worth the efforts.

MyProposal - Stop writing code. Collectively define a structural specification that will slowly produce the initial version for a wire format for carrying trace events. Separate trace backends from the clients at the protocol level. Now you can resume writing code.

Trace vendors implement 1 server for every language

Open telemetry can provide reference implementations of clients that ship to these trace servers, which is what this repository would become. But this leaves room for those who want to implement their own clients because the chosen balance of ergonomics, performance, was not right for them. As it stands now you have a monumental effort beyond writing a client for a well defined wire-format. If the maintainers of this project disagree with my stance on plugins, great! I don't care. I can write my own client. As it stands I won't have that option without also implementing the glue for the tracer which defeats the purpose of investing resources in these convergence efforts. I really would like to urge the project owners to pump the brakes a little and consider the amount of effort that could spared by taking a different approach. Or at least provide some strong technical reasons to march on with this paradigm that expand beyond "this is what we had".

Testing Frameworks

Unit tests are not just for testing results, they also lock in your public API. Using a large unit tests abstraction that accepts interfaces everywhere means that type changes are possible without breaking unit tests. It also imposes additional cognitive load for potential contributors who are unfamiliar with the chosen libraries. In short they only provide a tool to circumvent engineering rigor, relieving authors of both the responsibility to think about struct composition and the benefits of "testing" the said structures to begin with. I strongly believe the projects quality would suffer substantially by using them.

Plugins

I am not sure the design considerations behind the scenes that led to requiring a plugin for tracing via "stdout" but I very strongly object. Plugins should be entirely out of the question as an API requirement.

Plugins may not be unloaded
Plugins provide a potential source for API compatibility mismatches, as plugins:
- can't be locked / versioned by a go.mod file
- can't have large interfaces or impose compile time errors
- insert any argument against ABI as an API here
Plugins can negatively impact the API from evolving as suddenly ABI compatibility is a consideration
Plugins add additional deployment complexity and opportunity for runtime failures
Plugins do not provide nor enforce any code signing by default, adding potential attack vectors

I could continue on, but I feel this is a decent start for a technical debate. The most important argument against plugins is that they may still be provided for proprietary tracing solutions with careful design, you define the Tracer interface and write a Tracer which is backed by a plugin. Knowing this, I think it's easy to prove there is no real reason to circumvent an API to provide an ABI.

Dependencies

First, I feel this project should depend on only the standard library, maybe a couple packages from golang.org/x if well justified. I strongly believe this should be a non-negotiable requirement because it will force a healthier design than what the current tracing libraries of today have. As it stands now this library depends on a massive (257) amount of packages go.sum and this project is just getting started. I understand that part of this is because there is a fundamental problem with the design of OpenTelemetry as a whole providing implementations rather than specifications, but there is still plenty of opportunity to improve here.

Batched Span Processor

Add Batched Span Processor that collects all the spans and asynchronously process them.
Allow batch size and batch delay configuration.

Repo and Directory structure

How should the repo/directory should be?

Single Repo
Directory for each module.
Separate repo for API, SDK., etc