open-telemetry / opentelemetry-collector-contrib Goto Github PK
View Code? Open in Web Editor NEWContrib repository for the OpenTelemetry Collector
Home Page: https://opentelemetry.io
License: Apache License 2.0
Contrib repository for the OpenTelemetry Collector
Home Page: https://opentelemetry.io
License: Apache License 2.0
Relates to #175
Kubernetes exposes 3 container statuses, Running
, Waiting
and Terminated
. The k8s_cluster
receiver for now collects this information as part of resource metadata. But this information could also be captured as metric time series. This issue will be used to finalize the approach used to gather this information.
For more details see: census-instrumentation/opencensus-service#454
We need to make clear how to build a custom instance of the service that supports exactly the desired set of receivers/processors/exporters and extensions.
This issue is just to make sure we not ignore the InstrumentationLibrary info when converting from otlp to oc.
This needs to be fixed in [trace|metrics]/otlp_to_oc.go
.
We should be able to build multiple binaries of OpenTelemetry Contrib Collector from this repository. The binaries would bundle a different set of components, use different libraries or protocols etc.
In future, we could even have a form on the OpenTelemetry website that would allow users to check which components they want and generate a custom build for them. For now we just need the ability to generate more than one builds i.e, add another cmd/<buildName>
directory and the tooling to support it.
Hi everyone,
We're currently trying to implement the K8s processor in our agent pods, but seem to have hit a hurdle. When checking the logs to the pods we are seeing the following:
pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:98: Failed to list *v1.Pod: Get "https://172.20.0.1:443/api/v1/pods?limit=500&resourceVersion=0": Forbidden
Adding the following Service Account, ClusterRole and Clusterrolebinding to the pod does not seem to have resolved the issue:
apiVersion: v1
kind: ServiceAccount
metadata:
name: opentracing-agent
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: opentracing-agent
rules:
- apiGroups: [""]
resources:
- configmaps
- daemonsets
- deployments
- endpoints
- events
- namespaces
- nodes
- pods
- replicasets
- services
- statefulset
verbs: ["get", "list", "watch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: opentracing-agent
subjects:
- kind: ServiceAccount
name: opentracing-agent
namespace: tracing
roleRef:
kind: ClusterRole
name: opentracing-agent
apiGroup: "rbac.authorization.k8s.io"
Is there an example of the required RBAC configuration? Below is some additional information if its of use. I've had a look around
Many thanks
kubectl auth can-i get pods --as=system:serviceaccount:tracing:opentracing-agent
yes
processors:
k8s_tagger:
filter:
node_from_env_var: NODE_NAME
passthrough: false
extract:
metadata:
- podName
When the otel collector is configured to receive Zipkin spans and export them into Lightstep and Zipkin, it sounds like the parentId is not propagated with the LightStep exporter.
In LightStep, the spans are then not linked, so all traces are unique to the service operation and not fully assembled as they are with the Zipkin exporter. The expected behavior is that the traces are fully assembled in LightStep as well.
How to reproduce it:
There is some redundancy between the Makefile that builds the contrib executable and the Makefile.Common that is used to build individual components. We should have a single source for the "common" targets to keep the convenience of quick builds and custom targets for components while avoiding divergence of targets between Makefile and Makefile.common. /cc @owais
I like the idea of tail based sampling. I would like to enhance tail based sampling in 2 ways:
Consider a case where we are tail-sampling based on span->http.status_code=500.
==================
WARNING: DATA RACE
Write at 0x00000165f518 by goroutine 98:
github.com/honeycombio/libhoney-go.Init()
/home/circleci/go/pkg/mod/github.com/honeycombio/[email protected]/libhoney.go:222 +0x202
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.newHoneycombTraceExporter()
/home/circleci/project/exporter/honeycombexporter/honeycomb.go:94 +0x1c6
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.(*Factory).CreateTraceExporter()
/home/circleci/project/exporter/honeycombexporter/factory.go:55 +0x73
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.testTraceExporter()
/home/circleci/project/exporter/honeycombexporter/honeycomb_test.go:82 +0x3a5
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.TestEmptyNode()
/home/circleci/project/exporter/honeycombexporter/honeycomb_test.go:278 +0x3fd
testing.tRunner()
/usr/local/go/src/testing/testing.go:991 +0x1eb
Previous read at 0x00000165f518 by goroutine 52:
github.com/honeycombio/libhoney-go.TxResponses()
/home/circleci/go/pkg/mod/github.com/honeycombio/[email protected]/libhoney.go:511 +0x6b
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.(*honeycombExporter).RunErrorLogger()
/home/circleci/project/exporter/honeycombexporter/honeycomb.go:287 +0x5b
Goroutine 98 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:1042 +0x660
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1284 +0xa6
testing.tRunner()
/usr/local/go/src/testing/testing.go:991 +0x1eb
testing.runTests()
/usr/local/go/src/testing/testing.go:1282 +0x527
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1199 +0x2ff
main.main()
_testmain.go:116 +0x337
Goroutine 52 (finished) created at:
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.(*honeycombExporter).pushTraceData()
/home/circleci/project/exporter/honeycombexporter/honeycomb.go:121 +0xd5
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.(*honeycombExporter).pushTraceData-fm()
/home/circleci/project/exporter/honeycombexporter/honeycomb.go:114 +0xe4
go.opentelemetry.io/collector/exporter/exporterhelper.traceDataPusherOld.withObservability.func1()
/home/circleci/go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/tracehelper.go:96 +0x129
go.opentelemetry.io/collector/exporter/exporterhelper.(*traceExporterOld).ConsumeTraceData()
/home/circleci/go/pkg/mod/go.opentelemetry.io/[email protected]/exporter/exporterhelper/tracehelper.go:48 +0x14a
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.testTraceExporter()
/home/circleci/project/exporter/honeycombexporter/honeycomb_test.go:86 +0x4b2
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/honeycombexporter.TestExporter()
/home/circleci/project/exporter/honeycombexporter/honeycomb_test.go:182 +0x1ae9
testing.tRunner()
/usr/local/go/src/testing/testing.go:991 +0x1eb
==================
--- FAIL: TestEmptyNode (0.02s)
testing.go:906: race detected during execution of test
FAIL
Does Otel collector provide a possibility to use duration
as a policy?
I am trying to achieve that with the following collector configuration:
...
processors:
batch:
queued_retry:
tail_sampling:
decision_wait: 10s
policies:
- name: sample-long-running-requests
type: numeric_attribute
numeric_attribute: {key: duration, min_value: 1000, max_value: 10000}
...
Splunk HTTP Event Collector can digest traces and events as event data, and data points as metrics data.
Add support to send to Splunk HEC, along with configuration items.
Kubernetes has a notion of "container spec name". This is different from the actual name of the running container. The latter is a construct of the container engine, whereas the former seems specific to Kubernetes.
Container spec name is exposed via the Pod spec: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core and this not the same as the name of the running container. Currently this information is captured in a label called container_spec_name
.
Handle these differences appropriately in k8s_cluster
receiver once details are finalized.
Relates to: #175
This issue relates to #175. The k8scluster
receiver currently is capable of syncing metrics representing the state of a Kubernetes cluster. Apart from these metrics, services like SignalFx are also interested in cluster state information in the form of metadata (properties and tags in SignalFx's case). Add ability to sync this information.
This issue was automatically created by a failed stability test: ${{ CIRCLE_BUILD_URL }}.
CI failure for PR #88
--- FAIL: TestTrace10kSPS (22.32s)
--- FAIL: TestTrace10kSPS/OpenCensus (6.00s)
test_case.go:354: CPU consumption is 40.3%, max expected is 35%
--- PASS: TestTrace10kSPS/SAPM (16.32s)
FAIL
exit status 1
FAIL github.com/open-telemetry/opentelemetry-collector-contrib/testbed/tests 54.980s
# Test Results
Started: Fri, 03 Jan 2020 22:23:23 +0000
Test |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Metric10kDPS/SignalFx |PASS | 15s| 20.0| 20.7| 36| 45| 150000| 150000|
Metric10kDPS/OpenCensus |PASS | 18s| 7.5| 8.0| 42| 52| 149900| 149900|
Trace10kSPS/OpenCensus |FAIL | 6s| 32.5| 40.3| 39| 59| 59660| 57400|CPU consumption is 40.3%, max expected is 35%
Trace10kSPS/SAPM |PASS | 16s| 43.7| 53.0| 69| 88| 149590| 149590|
After #65 test is run for each module in its own sub-directory. The raw coverage report for each module is is separate. These reports can be merged with following bash script.
#!/bin/bash
files=`find . -name 'coverage.txt' | egrep -v '^\./coverage.txt' | tr '\n' ' '`
echo "mode: set" > $1 && cat $files | grep -v mode: | sort -r | awk '{if($1 != last) {print $0;last=$1}}' >> $1
The merged report can then be converted to html format using.
go tool cover -html=coverage.txt -o coverage.html
The issue with merging is that it doesn't work for the module which is new. Following error occurs for the new module
cover: no matching versions for query "latest"
Not sure what the root cause is but it is probably related go tool attempting to retrieve latest version of the new module from github which doesn't exists (because it is new - chicken-and-egg problem).
OpenTelemetry Collector repo name is changed to github.com/open-telemetry/opentelemetry-collector
Import paths in this repo need to be changed correspondingly.
Currently, it seems that there is a Dockerfile that can create an image from a binary built on the host machine. It would be nice to have a Dockerfile that can actually build the binary too, for example this one - https://github.com/anuraaga/opentelemetry-collector-contrib/blob/custom/Dockerfile
This way, it's easy to locally build a docker image for master without worrying about tooling using something like docker build -t opentelemetry-collector:test https://github.com/opentelemetry/opentelemetry-collector-contrib.git
or similarly for local changes. This can work especially well with docker-compose
which will always start up containers after building source without any other steps.
If this seems like a reasonable idea, I can send a PR with the Dockerfile I linked.
We intend to build distribution of the contrib repo with all components present. In order to ensure that quality of this build we should define a minimum set of requirements (test, documentation, etc) for all components available in the repo.
Zipkin and Jaeger clients support sending trace data without populating Node.ServiceInfo
on TraceData
instances. This will currently cause the collector to crash if the honeycomb exporter is configured as part of a pipeline exporting those traces. We should check that the field is not nil before trying to set it as an attribute.
Currently, the string-attribute-filter in OpenCensus collector allows spans to be exported if they match specified strings, i.e. whitelisted. It is, however, currently not possible to create a blacklist.
One specific use-case for this is to exclude traces for grpc.health.v1.HealthCheck
, which are a standard way of adding liveness and readiness endpoints to GRPC servers, but which add lots of noise to trace lists. (The current SpanContext for OpenCensus does not include the span name, so exclusion at that point is not possible either.)
From a flexibility point of view, it would be ideal if there was some sort of simple syntax for including or excluding partial matches too, whether through regexes, globs etc, though latency might be a consideration. At a minimum though, the ability to blacklist certain strings would be great.
Per @songy23, this blocked by open-telemetry/opentelemetry-collector#221.
crypto/sha1
is imported in processor/processorhelper/hasher.go and some tests, and we suppress the warning from Gosec about it being a weak cryptographic primitive. We should document why SHA1 is appropriate (e.g. it's part of an external specification), or switch to something else.
[/Users/lazy/github/opentelemetry-collector/processor/attributesprocessor/attribute_hasher.go:18] - G505 (CWE-327): Blacklisted import crypto/sha1: weak cryptographic primitive (Confidence: HIGH, Severity: MEDIUM)
> "crypto/sha1"
[/Users/lazy/github/opentelemetry-collector/processor/attributesprocessor/attribute_hasher.go:61] - G401 (CWE-326): Use of weak cryptographic primitive (Confidence: HIGH, Severity: MEDIUM)
> sha1.New()
Insecure connections for the Stackdriver exporter are currently broken because the underlying library creating the connections is explicitly creating a secure connection.
Attempting to build this repo with go 1.13 currently fails. Here is the output of make test
:
$ make test
go: directory receiver/zipkinscribereceiver is outside main module
go test -race -timeout 30s
build .: cannot find module for path .
make: *** [test] Error 1
It appears the behavior of go list
in go 1.13 is different. It prints the warning "outside main module" which then causes the problem when makefile attempts to parse that output.
Tested on Mac OS X, go1.13 darwin/amd64
Likely caused by golang/go@b9edee3
Possible solution is to exclude nested modules from ALL_SRC variable in makefile. However this will also remove them from make test
target. To bring back testing of components in nested modules we can call make test
in component subdirectories.
This will probably also require merging test coverage results into one report.
Currently, the processor supports following tags:
It would be great to include more tags, such as:
Additionally, it would be good to have an ability to include all labels and annotations
Please give service-approvers Write access and service-maintainers Admin access to this repository (this will mirror the permissions we already have for opentelemetry-service repo).
This is the last task mentioned in open-telemetry/opentelemetry-collector#352 but right now we don't have the rights to do that (tracked in #2).
Hello,
I am aware of a k8s operator to deploy the collector but would like to propose providing a helm chart. If so we could have a repo such as open-telemetry/helm-charts
The imports statements are not consistent across the repo. This is important specially taking into account that others can use the implementations here as starting point for their own.
Add a README file to k8sprocessor container user documentations. In addition to documenting all the config options and what they do, we should also mention recommended deployment scenarios and configurations.
azuremonitorexporter does not populate servicename to cloudRole and cloudRoleInstance properties into the envelope, so that application map shows incorrect service name.
The exporter code is supposed to populate the service name, but app insights telemetry doesn't include cloudRole field.
Application map
Test code(See below) uses Jaeger exporter and configure "trace-demo" service name. The expected service name on the bottom of the circle is "trace-demo", but it shows "dapr-dev-insights".
Dependency Telemetry
timestamp [UTC] | 2020-04-24T13:38:22.411464Z
| id | 71c8d87ea48d50b5
| name | bar
| success | True
| resultCode | 0
| duration | 0.004
| performanceBucket | <250ms
| itemType | request
| customDimensions | {"float":"312.23","span.kind":"server","exporter":"jaeger"}
| operation_Name | bar
| operation_Id | 07a742e8cd5e7afa33d8bb34c2c59f9b
| operation_ParentId | dfc8071f1b93436b
| client_Type | PC
| client_Model | Other
| client_OS | Other
| client_IP | 0.0.0.0
| client_City | xxx
| client_StateOrProvince | xx
| client_CountryOrRegion | United States
| client_Browser | Go-http-client 1.1
| appId | b61b23aa-7a2e-4182-9431-8689af7bd8d5
| appName | dapr-dev-insight
| iKey | b723ef3d-a015-4e6e-84bf-e898d528f677
| itemId | ddd5f422-8630-11ea-bc9c-936b910cbc1c
| itemCount | 1
Opentelemetry configuration
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-conf
labels:
app: opentelemetry
component: otel-collector-conf
data:
otel-collector-config: |
receivers:
jaeger:
protocols:
thrift_http:
endpoint: "0.0.0.0:14268"
processors:
queued_retry:
batch:
extensions:
health_check:
pprof:
endpoint: :1888
zpages:
endpoint: :55679
exporters:
azuremonitor:
azuremonitor/2:
endpoint: "https://dc.services.visualstudio.com/v2/track"
instrumentation_key: "ikey"
# maxbatchsize is the maximum number of items that can be queued before calling to the configured endpoint
maxbatchsize: 100
# maxbatchinterval is the maximum time to wait before calling the configured endpoint.
maxbatchinterval: 10s
service:
extensions: [pprof, zpages, health_check]
pipelines:
traces:
receivers: [jaeger]
exporters: [azuremonitor/2]
processors: [batch, queued_retry]
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
labels:
app: opencesus
component: otel-collector
spec:
ports:
- name: otel # Default endpoint for Opencensus receiver.
port: 14268
protocol: TCP
targetPort: 14268
selector:
component: otel-collector
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
labels:
app: opentelemetry
component: otel-collector
spec:
replicas: 1 # scale out based on your usage
selector:
matchLabels:
app: opentelemetry
template:
metadata:
labels:
app: opentelemetry
component: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:0.3.0
command:
- "/otelcontribcol"
- "--config=/conf/otel-collector-config.yaml"
resources:
limits:
cpu: 1
memory: 2Gi
requests:
cpu: 200m
memory: 400Mi
ports:
- containerPort: 14268 # Default endpoint for Opencensus receiver.
volumeMounts:
- name: otel-collector-config-vol
mountPath: /conf
#- name: otel-collector-secrets
# mountPath: /secrets
livenessProbe:
httpGet:
path: /
port: 13133
readinessProbe:
httpGet:
path: /
port: 13133
volumes:
- configMap:
name: otel-collector-conf
items:
- key: otel-collector-config
path: otel-collector-config.yaml
name: otel-collector-config-vol
# - secret:
# name: otel-collector-secrets
# items:
# - key: cert.pem
# path: cert.pem
# - key: key.pem
# path: key.pem
package main
import (
"context"
"log"
"go.opentelemetry.io/otel/api/core"
"go.opentelemetry.io/otel/api/global"
"go.opentelemetry.io/otel/api/key"
"go.opentelemetry.io/otel/api/trace"
"go.opentelemetry.io/otel/exporters/trace/jaeger"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
// initTracer creates a new trace provider instance and registers it as global trace provider.
func initTracer() func() {
// Create and install Jaeger export pipeline
_, flush, err := jaeger.NewExportPipeline(
jaeger.WithCollectorEndpoint("http://localhost:14268/api/traces"),
jaeger.WithProcess(jaeger.Process{
ServiceName: "trace-demo",
Tags: []core.KeyValue{
key.String("exporter", "jaeger"),
key.Float64("float", 312.23),
},
}),
jaeger.RegisterAsGlobal(),
jaeger.WithSDK(&sdktrace.Config{DefaultSampler: sdktrace.AlwaysSample()}),
)
if err != nil {
log.Fatal(err)
}
return func() {
flush()
}
}
func main() {
fn := initTracer()
defer fn()
ctx := context.Background()
tr := global.Tracer("component-main")
ctx, span := tr.Start(ctx, "foo", trace.WithSpanKind(trace.SpanKindClient))
bar(ctx)
span.End()
}
func bar(ctx context.Context) {
tr := global.Tracer("component-bar")
_, span := tr.Start(ctx, "bar", trace.WithSpanKind(trace.SpanKindServer))
defer span.End()
// Do bar...
}
Add a receiver to generate Host Metrics about the machine the collector is running on. Initially this will be for Windows only, but extendable so that the same receiver can be used to collect Linux metrics when implemented
Among changes in internal data format is removal of SourceFormat attribute, which was used by the processor logic in certain cases.
The passthrough mode seems to no longer work when spans are sent using Jaeger gRPC (and possible some of the other formats as well)
As discussed here: open-telemetry/opentelemetry-collector#500
Jaeger Kafka exporter support should be added to the collector through the contrib repo.
This issue was automatically created by a failed stability test: https://circleci.com/gh/owais/ci-tests/103.
In the tail sampling processor, the following lines of code delete the trace from memory without honouring the sampling decision from OnDroppedSpans()
for the given policies.
We could either -
OnDroppedSpans()
signature to never return a sampling decision (force the drop of an old trace). Currently the implementation nullifies the always_sample
policy.dropTrace()
/ ConsumeTraceData()
method to incorporate for custom use cases where the user might want to stop trace ingestion if the queue is full, instead of dropping old traces which are already batched for processing.Export metrics, traces, logs data to kafka
Investigate using Integers instead of strings for Actions during attributes logic application to a span. The investigation would do a comparison between string value comp and integer value comp to determine if the current implementation is a bottle neck.
Relates to #175
The k8s_receiver
currently associates Pods with its workloads using k8s.workload.name
and k8s.workload.kind
. For example, if a Pod
is spawned by a CronJob
called foo
, the created Pod will have labels like k8s.workload.kind:cronjob
and k8s.workload.name:foo
. This applies to workloads of other types as well.
The above mechanism of association should be revisited after we finalize on the new conventions for naming Kubernetes metadata.
It's currently unclear how to appropriately handle start errors for receivers started in receiver_creator. This includes errors returned by Start()
as well as errors reported through ReportFatalError
. The existing methods are built around the assumption that start errors always occur at process start or shortly thereafter.
See discussion in #173 (comment)
Please keep the files in sync.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.