Comments (10)
Currently working on this, I'm hoping to have a first draft ready for tomorrow, working on this branch: https://github.com/Jeffail/benthos/tree/feature/distributed-tracing
Progress so far:
- Using the opentracing API
- Message parts are able to have a context attached
- That context can be used as a reference to a span
- Spans are created at the input level, and children of those root spans can be constructed by any Benthos component
TODO:
- Verify that multiple calls to
Finish
on a span don't blow it up (makes my code much simpler) - Create a new component
tracer
similar tometrics
, with options for different aggregators - Create a new processor
trace
similar tometric
I'm implementing this in a way that doesn't change any existing framework APIs, so we won't need to wait for a V2 release to have this out. I'm also being very careful not to expose the context associated with message parts in the API as I do not want this to bleed over into some sort of cancellation mechanism by accident.
from benthos.
Branch is looking good so far. I've got a new tracer
component similar to metrics
, currently only supports jaeger
and none
. I need to add traces to more processors but this is looking like it'll be ready to merge into master soon. The best part is that none of the changes are going to break the config spec or internal API so this won't require a major release bump.
from benthos.
Implemented: ce0609c
Released: https://github.com/Jeffail/benthos/releases/tag/v1.6.0
from benthos.
I might take a stab at it and we can discuss improvement once an initial PR is sent.
from benthos.
Hey @ledor473, thanks for the quick feedback. I'll add those fields to the config spec. I'm not too worried about having all fields exposed for now since it's easy to add more as/when they get requested.
I wouldn't normally use environment variables directly since they can be specified within a config: https://github.com/Jeffail/benthos/blob/master/docs/config_interpolation.md#environment-variables. However, the AWS components have already set a precedent of allowing direct env var configs, so I'm not opposed to adding it for Jaeger as well.
from benthos.
Hey @ledor473,
I'm not particularly well versed as to how distributed tracing is used so I'd need to do some reading up on it, but from my basic knowledge it does seem very fitting for a project like Benthos so I'd be happy to explore it as an option.
I'm going to mark this one as help needed as it would be good to share notes as to how much work would be involved, how we would test it, etc, as it sounds like a big task.
Thanks for the suggestion, I have a feeling this could be extremely valuable and I wouldn't have considered it otherwise.
from benthos.
I’ve done a bit of work with tracing. The biggest complicating factor here is that there isn’t a standard way to inject or extract a span. It’s specifically omitted from the opentracing spec.
There are zipkin http standards (x-b3-* headers) but that’s one tracer and one protocol.
Internal tracing could be nice for profiling. “zipkin-http” and “jaeger-http” extract/injectors could be provided, but anything else should probably be written by the end user.
from benthos.
Planning to release the first phase of this in v1.6.0 later today, which is:
-
New
tracer
component in the root of the config spec, allowing you to choose a tracer target. This feature is considered stable in that I do not intend to change it without a major version release. -
Internal API for working with spans on messages, this is mostly hidden from the current stream APIs but exposed through helper functions. If I find there's a major flaw (but not necessarily broken) in these functions then I might modify them without a major version release.
-
The actual information exposed by Benthos components through opentracing is considered experimental. I've made a first pass at exposing useful information from all processors, but the information exposed as well as its formatting is subject to adjustment without a major version release. I've added a disclaimer to the documentation for
tracers
that explains this. -
Each message is given a root span at the input level of a Benthos pipeline, that span is finished when the message is acknowledged at the output level. It is possible for an input component to extract a root span from a previous service, this is implemented already in the HTTP input types (using headers) and I intend to gradually try and add solutions for this to most input types where possible.
-
All APIs for opentracing within Benthos assume a global tracer. I'm doing this to save having to propagate a tracer reference through all components. This would be a problem if we decided to do clever stuff like namespacing spans for pipelines running in streams mode, or outputting to multiple tracers. However, doing so would also require breaking changes to the API so this would need to come in at V2 anyway, which I'm open to if there's a good case for it.
If anyone has any concerns or feedback that might change these plans please let me know soon, I'm more than happy to delay merging if I've not gotten this quite right.
from benthos.
@Jeffail I've quickly looked at the changes in the branch and what stands out the most for me is the configuration. Jaeger has quite a few settings that are useful and while you got the required one, I think it would be nice to access all of them.
A way to do it would be to use configuration.FromEnv()
like here: https://github.com/jaegertracing/jaeger-client-go/blob/master/config/example_test.go#L110
Which would let people use any of these environment variables: https://github.com/jaegertracing/jaeger-client-go#environment-variables
The change would likely be only this line: https://github.com/Jeffail/benthos/compare/feature/distributed-tracing#diff-eb55ef0e4904b260bcd6fd7ba4318fe4R80
That being said, I'm not sure if Benthos use environment variable elsewhere... so if you would prefer exposing more settings in JaegerConfiguration, I think the following would be valuable:
- JAEGER_SAMPLER_TYPE: Especially useful to use the remote sampler
- JAEGER_SAMPLER_MANAGER_HOST_PORT: Needed when using the remote sampler
- JAEGER_TAGS: Let's you configure Tracer level tags
from benthos.
Added those extra fields. I've left it so that doing direct environment based configuration is possible in the future if it becomes a hot request.
Leaving it as a PR for now. I'm going to walk away and clear my head for a couple of hours before reading through it again.
I think the only snag I've encountered so far is that when using the batch
processor the root span can become finished before the children spans when the message doesn't trigger the batch flush. Jaeger seems to cope fine with that but as it's undefined territory (and looks odd) I need a solution eventually. I haven't got one yet that I would consider "clean", so leaving it as it is for now.
from benthos.
Related Issues (20)
- WriteBatch Method in BatchOutput Interface Does Not Return Errors Anymore HOT 6
- how to use sql_select or sql_raw HOT 1
- MongoDB regression: ISODate fields get persisted as strings
- Mappings caveat or hidded beahaviour or bug with complex json HOT 1
- Incorrect default Redis port when creating configuration HOT 2
- http over amqp_0_9 proxy HOT 1
- sql_insert ORA-01483 HOT 8
- Bug: nats_jetstream input has a logic error in the stream check
- Allow setting benthos cli flags from environment variables
- Bloblang minification / alternative to newlines for separating statements
- schema_registry_encode double encodes path params HOT 1
- [Feature request] Telegram output HOT 1
- [Feature Request] Support inserting UUID from string in cassandra HOT 5
- NATS User Password Support HOT 2
- Workflow Processor - DAG Execution Ordering HOT 13
- Task resource isolation In streams model
- Add Splunk_hec label to metrics output HOT 1
- output component fallback not work HOT 1
- [Feature Request] Trim whitespaces from columns in CSV scanner/input HOT 1
- aws_s3: Scanner and backing reader not closed on non `io.EOF` error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from benthos.