Comments (3)
Pinging code owners:
- exporter/loadbalancing: @jpkrohling
- exporter/prometheusremotewrite: @Aneurysm9 @rapphil
See Adding Labels via Comments if you do not have permissions to add labels yourself.
from opentelemetry-collector-contrib.
and due to the requirement of ordered-writes, we saw a significant percentage of metrics dropped by AMP
I'm afraid this is more about Prometheus than load balancing itself, but why are you getting those out-of-orders only when receiving data from multiple clusters? Are you having the same metric stream (metric name + attributes) coming from different clusters? Are you dropping the service.instance.id along the way?
from opentelemetry-collector-contrib.
I'm afraid this is more about Prometheus than load balancing itself, but why are you getting those out-of-orders only when receiving data from multiple clusters?
The data coming from each cluster has unique stream labels attached - so one cluster can not really impact another cluster in terms of out-of-order samples. I believe the out-of-order issue has to do with timing .. For a single given stream, imagine this scenario:
Given a single stream: my_metric{instance='1.1.1.1:9200, pod_uid='<unique uuid>'}
- Initial connection through LB hits
metrics-ingester-0:4317
and starts accepting datapoints. - Datapoints Sent:
T1, T2, T3, T4
- Connection is interrupted because
metrics-intester-0
is going to be replaced in K8S - New connection through LB hits
metrics-ingester-1:4317
and starts accepting datapoints... - Datapoints Sent:
T5, T6, T7, T8
metrics-ingester-0
begins its shutdown... has a bunch of data it needs to flush..metrics-intgester-1
performs a flush of theT5-T8
samplesmetrics-ingester-1
tries to flushT1-T4
samples.. but hits anout-of-order
error and the samples are dropped.
This is a contrived example ... but the point I am trying to demonstrate is that the timing of when the samples are flushed out to Prometheus matters. If two different OTEL collector pods end up with samples for the same stream, but they flush out of order, then you create the out-of-order error situation.
We tried using IP-based session stickiness ... but that didn't really work at all, and is problematic for a lot of reasons. Session stickiness based on some cookie would be useful, if the OTEL client supported it.
Are you having the same metric stream (metric name + attributes) coming from different clusters?
No - definitely unique streams from different clusters
Are you dropping the service.instance.id along the way?
No we are not
from opentelemetry-collector-contrib.
Related Issues (20)
- k8s.pod.phase not providing correct info if my pod status is Crashbackoff look HOT 2
- Supervisor hangs when OpAMP server backend is restarted HOT 1
- k8sobjects to elasticsearch failed for the field app HOT 3
- Filter all the data sent by an application HOT 1
- loadbalancing: Collector fails to start if k8s_resolver encounters issues with watch/list endpoints HOT 1
- [exporter/file] Add posibility to write telemetry in Parquet or Delta format HOT 5
- [prometheus] Unable to run prometheus receiver in namespaced mode HOT 2
- W3C trace_id backwards compatibility may break multiple components. HOT 1
- influxes receiver might need more endpoints HOT 1
- [exporter/elasticsearch] Metrics support in non-ECS modes HOT 1
- otelcol.exporter.loadbalancing fails with no configuration has been provided, try setting KUBERNETES_MASTER environment variable HOT 3
- Get container metadata in resourcedetection processor. HOT 1
- New component: container processor HOT 7
- add support for cluster agent HOT 1
- [processor/transform] Add Function to convert Exponential Histograms to normal Histograms HOT 6
- Add more detailed error message when `initPrometheusComponent` failed HOT 4
- [connector/servicegraph] Tests consistently failing on Windows HOT 2
- [prometheusremotewrite] Partial collector metrics exported after upgrade from v0.84.0 HOT 2
- Weekly Report: 2024-06-25 - 2024-07-02
- improve summary metrics converting. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opentelemetry-collector-contrib.