Coder Social home page Coder Social logo

jaegertracing / jaeger-clickhouse Goto Github PK

View Code? Open in Web Editor NEW
232.0 15.0 50.0 2.11 MB

Jaeger ClickHouse storage plugin implementation

License: Apache License 2.0

Makefile 2.13% Go 97.70% Dockerfile 0.17%
grpc clickhouse clickhouse-database jaegertracing

jaeger-clickhouse's Introduction

Jaeger ClickHouse

This is a Jaeger gRPC storage plugin implementation for storing traces in ClickHouse.

Project status

This is a community-driven project, and we would love to hear your issues and feature requests. Pull requests are also greatly appreciated.

Why use ClickHouse for Jaeger?

ClickHouse is an analytical column-oriented database management system. It is designed to analyze streams of events which are kind of resemblant to spans. It's open-source, optimized for performance, and actively developed.

How it works

Jaeger spans are stored in 2 tables. The first contains the whole span encoded either in JSON or Protobuf. The second stores key information about spans for searching. This table is indexed by span duration and tags. Also, info about operations is stored in the materialized view. There are not indexes for archived spans. Storing data in replicated local tables with distributed global tables is natively supported. Spans are bufferized. Span buffers are flushed to DB either by timer or after reaching max batch size. Timer interval and batch size can be set in config file.

Database schema generated by JetBrains DataGrip Picture of tables

How to start using Jaeger over ClickHouse

Documentation

Refer to the config.yaml for all supported configuration options.

Build & Run

Docker database example

docker run --rm -it -p9000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server:22
GOOS=linux make build run
make run-hotrod

Open localhost:16686 and localhost:8080.

Custom database

You need to specify connection options in config.yaml, then you can run

make build
SPAN_STORAGE_TYPE=grpc-plugin {Jaeger binary adress} --query.ui-config=jaeger-ui.json --grpc-storage-plugin.binary=./{name of built binary} --grpc-storage-plugin.configuration-file=config.yaml --grpc-storage-plugin.log-level=debug

Credits

This project is originally based on this clickhouse plugin implementation.

See also jaegertracing/jaeger/issues/1438 for historical discussion regarding the implementation of a ClickHouse plugin.

jaeger-clickhouse's People

Contributors

albertlockett avatar arajkumar avatar bocharovf avatar chhetripradeep avatar clannadxr avatar einkrebs avatar faceair avatar nickbp avatar pavolloffay avatar vuuihc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jaeger-clickhouse's Issues

Capacity planning

@chhetripradeep from https://github.com/pavolloffay/jaeger-clickhouse/pull/34#issuecomment-886882402

We run with 3 replica and as we need to expand the cluster we add more shards. One thing to note is clickhouse doesn't have any inbuilt data balancing feature i.e. once a data is written to a node, it will stay there throughout the lifetime of that data unless the operator moves the data manually, so it's good to do capacity planning in the beginning of cluster provisioning.

Would you like to create a doc/guide for capacity planning?

Fix time zone issues

There is 2 issues while working with ClickHouse database with non-UTC time zone:

  • Plugin cannot connect to database due to following error:
    could not connect to database: "could not load time location: unknown time zone {any non-UTC time zone}"
  • Searching for traces works incorrectly: search range "shifts" by difference between UTC and ClickHouse server time zone

Model alternative for jaeger_index table

On jaeger_index tables, the tags is coded as a nested array with key and values.
It is good for the only usage of Jaeger-query but in our company we are using jaeger also for analytics purposes.
Since Clickhouse 21.3, the Map type (https://clickhouse.com/docs/en/sql-reference/data-types/map/) is available. I think It could be a good alternative to Nested .

Do you have already made some performance (time and storage) tests with Map ?
Could it be an acceptable contribution (with a flag to not activate it by default) ?

Search by error tag does not work

Describe the bug
Filter "error=true" does not show traces with errors.

To Reproduce
Steps to reproduce the behavior:

  1. Use clickhouse storage plugin
  2. Run HotROD and produce some traces
  3. In Jaeger UI select "Redis" service and find traces. Check that there are traces with errors in Redis service.
  4. Add Tags filter "error=true"
  5. See "No trace results. Try another query."

Expected behavior
Traces with Redis error found

Screenshots
there are errors in traces

errors are in Redis

Redis errors not found

Version (please complete the following information):

  • OS: windows
  • Jaeger version: 1.27
  • Deployment: docker-compose
  • Clickhouse plugin version: 0.8
  • Clickhouse version: yandex/clickhouse-server:21

What troubleshooting steps did you try?
Tag filter works as expected with another tags (e.g. "param.driverID").
Error tag filter works with Elasticsearch plugin.

I use that docker-compose to compare ELK and Clickhouse storages.

[Bug]: Resolve High CVEs

What happened?

We currently use the jaeger-clickhouse image and our security team has flagged it as being impacted by two HIGH CVEs

To resolve these CVEs the following packages need to be updated to a minimum version of:

  • golang.org/x/net - 0.1.1-0.20221104162952-702349b0e862
  • golang.org/x/text - 0.3.8

We prefer to have the packages fixed upstream to ensure that everyone can benefit from the updates.

Steps to reproduce

Using a vulnerability scanners (e.g. aqua/trivy) scan the jaeger-clickhouse image

trivy image jaeger-clickhouse:0.13.0

Expected behavior

No vulnerabilities listed.

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

No response

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

[Bug]: why jaeger don't connect clickhouse databases;

What happened?

I do it in the simplest way,
,https://github.com/jaegertracing/jaeger-clickhouse/blob/main/guide-kubernetes.md

kubectl get cm jaeger-clickhouse -o yaml
apiVersion: v1
data:
  config.yaml: |
    address: clickhouse-jaeger:9000
    username: clickhouse_operator
    password: clickhouse_operator_password
    spans_table:
    spans_index_table:
    operations_table:
kind: ConfigMap

But the report cannot connect to the clickhouse

The error log is as follows:

[ERROR] jaeger-clickhouse: Failed to create a storage: @module=jaeger-clickhouse EXTRA_VALUE_AT_END="could not connect to database: \"dial tcp: missing address\"" timestamp=2023-02-09T09:02:29.083Z

kubectl logs jaeger-clickhouse-854dfc4c5d-fkcq9

Defaulted container "jaeger" out of: jaeger, install-plugin (init)
2023/02/09 09:02:29 maxprocs: Leaving GOMAXPROCS=36: CPU quota undefined
{"level":"info","ts":1675933349.0723028,"caller":"flags/service.go:119","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1675933349.0723774,"caller":"flags/service.go:125","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1675933349.07262,"caller":"flags/admin.go:129","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1675933349.0726702,"caller":"flags/admin.go:143","msg":"Starting admin HTTP server","http-addr":":14269"}
{"level":"info","ts":1675933349.0726848,"caller":"flags/admin.go:121","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
2023-02-09T09:02:29.075Z [WARN]  plugin configured with a nil SecureConfig
2023-02-09T09:02:29.075Z [DEBUG] starting plugin: path=/plugin/jaeger-clickhouse args=["/plugin/jaeger-clickhouse", "--config", "/plugin-config/config.yaml"]
2023-02-09T09:02:29.076Z [DEBUG] plugin started: path=/plugin/jaeger-clickhouse pid=23
2023-02-09T09:02:29.076Z [DEBUG] waiting for RPC address: path=/plugin/jaeger-clickhouse
2023-02-09T09:02:29.083Z [ERROR] jaeger-clickhouse: Failed to create a storage: @module=jaeger-clickhouse EXTRA_VALUE_AT_END="could not connect to database: \"dial tcp: missing address\"" timestamp=2023-02-09T09:02:29.083Z
{"level":"fatal","ts":1675933349.0847218,"caller":"./main.go:109","msg":"Failed to init storage factory","error":"grpc-plugin builder failed to create a store: error attempting to connect to plugin rpc client: Unrecognized remote plugin message: \n\nThis usually means that the plugin is either invalid or simply\nneeds to be recompiled to support the latest protocol.","stacktrace":"main.main.func1\n\t./main.go:109\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:968\nmain.main\n\t./main.go:240\nruntime.main\n\truntime/proc.go:250"}
2023-02-09T09:02:29.084Z [ERROR] plugin process exited: path=/plugin/jaeger-clickhouse pid=23 error="exit status 1"

clickhouse and database is normal

# kubectl get pod
NAME                                             READY   STATUS             RESTARTS       AGE
busybox                                          1/1     Running            25 (30m ago)   25h
chi-jaeger-cluster1-0-0-0                        1/1     Running            0              18m
jaeger-clickhouse-854dfc4c5d-fkcq9               0/1     CrashLoopBackOff   6 (5m6s ago)   10m


# kubectl exec -it statefulset.apps/chi-jaeger-cluster1-0-0 -- clickhouse-client  -h clickhouse-jaeger --user clickhouse_operator --password clickhouse_operator_password
ClickHouse client version 22.1.3.7 (official build).
Connecting to clickhouse-jaeger:9000 as user clickhouse_operator.
Connected to ClickHouse server version 22.1.3 revision 54455.

:) use jaeger

USE jaeger

Query id: c0bb766d-3fbe-4e4f-9fe8-68ac5a0b2345

Ok.

0 rows in set. Elapsed: 0.001 sec. 

 :) show databases;

SHOW DATABASES

Query id: 19942759-efd9-4d13-adf7-26acd678425b

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ jaeger             │
│ system             │
└────────────────────┘

5 rows in set. Elapsed: 0.002 sec. 

SELECT
    query_id,
    client_hostname,
    initial_address
FROM system.processes

Query id: 13ca8ec2-1449-45ca-a44a-89e37fa070b7

┌─query_id─────────────────────────────┬─client_hostname────────────────────┬─initial_address────┐
│ 13ca8ec2-1449-45ca-a44a-89e37fa070b7 │ clickhouse-client-5574484945-b7zx9 │ ::ffff:10.100.3.42 │
└──────────────────────────────────────┴────────────────────────────────────┴────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

Steps to reproduce

I do it in the simplest way,
https://github.com/jaegertracing/jaeger-clickhouse/blob/main/guide-kubernetes.md

Is this document missing the necessary steps, but I also tried to create the table manually

my jaeger-operator version is 1.41.0

Expected behavior

Is it an image version problem?

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

No response

SDK

No response

Pipeline

No response

Stogage backend

No response

Operating system

No response

Deployment model

No response

Deployment configs

No response

Decide on sharding function for distribbuted table

The distributed table could be created with multiple sharding functions: rand(), cityHash64(traceID) - see https://clickhouse.tech/docs/en/sql-reference/functions/hash-functions/.

The hash functions take an argument, we should consider using traceID to keep data from a single trace in the same location.

CREATE TABLE IF NOT EXISTS jaeger_spans AS jaeger_spans_local ENGINE = Distributed('{cluster}', default, jaeger_spans_local, cityHash64(traceID));

Add integration test for replicated database.

Requirement - what kind of business use case are you trying to solve?

Test replicated database in integration tests as well.

Problem - what in Jaeger-ClickHouse blocks you from solving the requirement?

No such config in workflows.

[Feature]: Support Native JSON columns in Clickhouse

Requirement

As a Clickhouse analytics user, I want the clickhouse-jaeger schema to allow using Clickhouse native JSON columns so that we can query data in clickhouse more efficiently (both in terms of performance and query simplicity)

Problem

Currently, Clickhouse-Jaeger stores JSON span data as a string column-type, which makes it quite verbose to have to query based on fields within the column using Clickhouse's JSON functions , especially if you get past 2 levels of nesting.

This is very evident, when you want to query the ingested data to generate your own analytics/insights. It would be nice if jaeear-clickhouse added support for Clickhouse native JSON columns

Proposal

A solution may be to start providing support for the native JSON datatype (It's still "experimental", but the spec has been quite stable for a while)

Open questions

The major open question is how this would affect the split between protobuf and json encoded data (currently, string supports both) and whether it'll add more complexities to the project. Need to observe more to see the impact of this, but wanted to raise this with the community/maintainers to get an idea of their thoughts.

Document and add support for deleting data/TTL

We should document how the old data can be removed (alter table jager_spans drop partition 20201) and add support for TTL https://clickhouse.tech/docs/en/sql-reference/statements/alter/ttl/ (The user could specify the number of days in the config).

E.g.

CREATE TABLE IF NOT EXISTS jaeger_index_local (
     timestamp DateTime CODEC(Delta, ZSTD(1)),
     traceID String CODEC(ZSTD(1)),
     service LowCardinality(String) CODEC(ZSTD(1)),
     operation LowCardinality(String) CODEC(ZSTD(1)),
     durationUs UInt64 CODEC(ZSTD(1)),
     tags Array(String) CODEC(ZSTD(1)),
     INDEX idx_tags tags TYPE bloom_filter(0.01) GRANULARITY 64,
     INDEX idx_duration durationUs TYPE minmax GRANULARITY 1
) ENGINE MergeTree()
PARTITION BY toDate(timestamp)
ORDER BY (service, -toUnixTimestamp(timestamp))
TTL timestamp + INTERVAL 90 DAY
SETTINGS index_granularity=1024

cc) @chhetripradeep could you please loop in and document how do you delete old data?

serialized, err = proto.Marshal(span) insert error

Describe the bug
serialized, err = proto.Marshal(span) insert error

Screenshots
image

Version (please complete the following information):

  • OS: [e.g. Linux]
  • Jaeger version: latest
  • clickhouse version :21.8.3.44

What troubleshooting steps did you try?
Try to follow https://www.jaegertracing.io/docs/latest/troubleshooting/ and describe how far you were able to progress and/or which steps did not work.

Additional context
Add any other context about the problem here.

Got plugin error "transport: error while dialing: dial unix /tmp/plugin"

Describe the bug
Got error in Jaeger UI with Clickhouse gRPC plugin when search for traces:
HTTP Error: plugin error: rpc error: code = Unavailable
desc = connection error:
desc = "transport: error while dialing: dial unix /tmp/plugin2381205323:
connect: connection refused

Seems it happens

  • either after several hours of inactivity of Jaeger Query
  • either after jaeger_index_local exceeds ~70kk rows

Clickhouse is up and running.
Restarting Jaeger Query fix the problem temporary (until next long search).

To Reproduce
Steps to reproduce the behavior:

  1. Ingest ~70kk rows in jaeger_index_local
  2. Search for traces

Expected behavior
Traces are found

Screenshots
image

Version (please complete the following information):

  • OS: Linux
  • Jaeger Query version: 1.25
  • Clickhouse plugin versin: 0.8
  • Clickhouse version: 21.8.10.19
  • Deployment: Kubernetes

What troubleshooting steps did you try?

Additional context
jaeger-query-clickhouse-5ff64c9dbc-h7jr4.log

Expose metrics

For fine-tuning parameters like

# Batch size. Default 10_000.
batch_write_size:
# Batch flush size. Default 5s.
batch_flush_interval:

It would be great to expose metrics like:

  • batch flush size
  • number of flushes due to timer expiration (batch_flush_interval)

Although, I am not sure if the batch size influences performance or not.

Explanation of '{cluster}'

Our replication and sharding guide uses https://github.com/pavolloffay/jaeger-clickhouse/blob/main/guide-sharding-and-replication.md#replication '{cluster}' substitution when creating distributed table e.g.

CREATE TABLE IF NOT EXISTS jaeger_spans ON CLUSTER '{cluster}' AS jaeger_spans_local ENGINE = Distributed('{cluster}', default, jaeger_spans_local, cityHash64(traceID));

I am not sure if I understand what it exactly does. Could somebody explain it? @EinKrebs @chhetripradeep

Let's say my CH deployment defines two clusters

<remote_servers>
    <example_cluster1>
       ...
    </example_cluster1>
    <example_cluster2>
       ...
    </example_cluster2>
</remote_servers>

So if the create command is executed would it crate tables on all clusters?

Add option to limit number of fetched spans per trace

Requirement - what kind of business use case are you trying to solve?

There are long living processes in our ecosystem involving many services. Sometimes it leads to anomaly huge traces (100k+ spans). Fetching all spans from such a traces slow down trace search.

Problem - what in Jaeger blocks you from solving the requirement?

There is no option (similar to es.max-num-spans for Elasticsearch storage) to limit amount of spans being fetched for each trace.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Introduce new setting in config file to imit amount of spans being fetched for each trace.

Durable database writes

Hi! Thanks for the project, I believe it's of a great value to the community.

Currently, this plugin accumulated data and writes it to the database. I think it's important to do several things to ensure more durable writes:

  1. Retry network and database failures. Use exponential backoff in a case when the database cannot server write immediately.
  2. Buffer data not written to DB. Ensure that the buffer does not overflow. Sacrifice data intentionally if it cannot be stored in DB.
  3. Reload connection string when requested: a user can add new shards to CH installation

What do you think?

Running with hotrod results in Too many simultaneous queries. Maximum: 100

2021.07.14 17:06:49.783711 [ 219 ] {11925d3b-7684-4919-827b-319af811c400} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2021.07.14 17:06:49.783769 [ 1010 ] {d4de6e5e-6305-4802-842e-13c660886ef2} <Error> TCPHandler: Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100, Stack trace:

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8d31b5a in /usr/bin/clickhouse
1. DB::ProcessList::insert(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::IAST const*, std::__1::shared_ptr<DB::Context const>) @ 0xfcd6802 in /usr/bin/clickhouse
2. ? @ 0xfe21ab3 in /usr/bin/clickhouse
3. DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, bool) @ 0xfe208e3 in /usr/bin/clickhouse
4. DB::TCPHandler::runImpl() @ 0x1069f6c2 in /usr/bin/clickhouse
5. DB::TCPHandler::run() @ 0x106b25d9 in /usr/bin/clickhouse
6. Poco::Net::TCPServerConnection::start() @ 0x1338b30f in /usr/bin/clickhouse
7. Poco::Net::TCPServerDispatcher::run() @ 0x1338cd9a in /usr/bin/clickhouse
8. Poco::PooledThread::run() @ 0x134bfc19 in /usr/bin/clickhouse
9. Poco::ThreadImpl::runnableEntry(void*) @ 0x134bbeaa in /usr/bin/clickhouse
10. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
11. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

The DB is started as docker run --rm -it -p9000:9000 --name some-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server:21

Add Operation.SpanKind support

Requirement - what kind of business use case are you trying to solve?

I ran jaeger grpc-plugin integration tests with this plugin and it failed.

Problem - what in Jaeger blocks you from solving the requirement?

Integration test failed because this plugin doesn't support jaeger/spanstore.Operation.SpanKind.

Support writeSpan via grpc stream

Requirement - what kind of business use case are you trying to solve?

When I was using jaeger-clickhouse, I found that WriteSpan via grpc stream will reduce the CPU utilization of jaeger and increase the throughput. So I made a pull request to jaeger, which supports grpc stream. But this still requires plugin support.

ref: jaegertracing/jaeger#3636

Integration GetOperations test fails

Describe the bug
Integration tests fails due to incorrect behaviour of GetOperations.

Expected behavior
Test won't fail

Screenshots
If applicable, add screenshots to help explain your problem.

Version (please complete the following information):

  • OS: Linux
  • Jaeger-ClickHouse version: 0.7.0
  • Deployment: bare metal & docker ClickHouse server

Too many jaeger-query WriteSpan traces written.

Describe the bug
Plugin writes very much traces(service='jaeger-query', operation='/jaeger.storage.v1.SpanWriterPlugin/WriteSpan'), even where there's no else spans to write. Count of traces per tick on default settings is at least 20. After writing spans that are not jaeger-query, circa several thousands of spans every tick can last for very long, even without writing any spans to jaeger.

To Reproduce
Steps to reproduce the behavior:

  1. Start docker image of clickhouse-server
  2. Start Jaeger with jaeger-clickhouse on default settings
  3. Generate a little spans(e.g. with tracegen or HotR.O.D.)
  4. See huge amount of spans being written every timer tick.
  5. If to check what are these spans, there's almost only jaeger-query/WriteSpan.
  6. Query done after a long time after tracegen finished work:
SELECT count()
FROM jaeger_index_local 
WHERE (service = 'jaeger-query') AND (operation = '/jaeger.storage.v1.SpanWriterPlugin/WriteSpan') AND (timestamp >= (now() - toIntervalMinute(1)))

┌─count()─┐
│   15329 │
└─────────┘

Expected behavior
No/very little of such spans.

Version (please complete the following information):

  • OS: Linux
  • Jaeger version: Jaeger v1.24, jaeger-clickhouse v0.7.0
  • Deployment: bare metal

What troubleshooting steps did you try?
Didn't find any of the info about such problem.

[Feature]: Add ttl_only_drop_parts into table setting or possible be configured

Requirement

If we would like to change TTL days clickhouse by default will be merge by rows a lot of data https://clickhouse.com/docs/en/operations/settings/settings/#ttl_only_drop_parts

Problem

It would be great don't waste resources during merges by rows expired by TTL, right now this is setting is not possible to set during creation time. If you would like to change TTL later , it will consume a lot of CPU resources.

Proposal

Add ttl_only_drop_parts settings for tables in to 1 (drop by parts means days) by default or make it possible configure before.

Open questions

No response

Dockerizing proposal

I have a number of proposals that I can make to this project:

  1. Two-stage build in docker. In this way, we will have a build in a reproducible environment.
  2. Optimized linking as much as possible. The image is required for production use. At high loads, even minor optimizations save resources.
  3. Build plugin along with Jaeger source code. In this way, we will influence the optimization of the building of Jaeger. We can use cache or saved docker levels to speed up building.
  4. Use Debian releases instead of Alpine distributive. One of the optimizations is linking with system libraries. Alpine has limited multithreading functionality due to the use of musl instead of libc. But there is no problem supporting both distributions.
  5. Image versioning that includes the Jaeger version, the plugin version, and the label that this container contains the plugin. The same approach is used by snyk. For example, the image will have the following tags:
    • ghcr.io/jaegertracing/jaeger-collector:1.29.0-clickhouse-0.8.0-stretch
    • ghcr.io/jaegertracing/jaeger-collector:1.29.0-clickhouse-0.8.0
    • ghcr.io/jaegertracing/jaeger-collector:1.29.0-clickhouse
    • ghcr.io/jaegertracing/jaeger-collector:clickhouse-0.8.0-stretch
    • ghcr.io/jaegertracing/jaeger-collector:clickhouse-0.8.0
    • ghcr.io/jaegertracing/jaeger-collector:clickhouse
  6. Have a complete set of images of own production: all-in-one, jaeger-agent, jaeger-collector, jaeger-ingester, jaeger-query.
  7. Run E2E-tests using docker-compose. example.

The implementation of part of the above can be found in this project https://github.com/levonet/docker-jaeger.
I'm ready to move this infrastructure and do support by my team during the time of using Jaeger.

[Bug]: System Architecture reported an error when I used Click House as the storage backend

What happened?

System Architecture reported an error when I used ClickHouse as the storage backend,the error like this:

HTTP Error: plugin error:rpc error: code = Unknown desc = not implemented

Steps to reproduce

  1. Use ClickHouse as the storage,the cmd is : SPAN_STROAGE_TYPE=grpc-plugin jaeger-all-in-one --grpc.stroage.plugin.binary=/path/to/jaeger-clickhouse-linux-amd64 --grpc.stroage.configuration=/path/to/config.yml
  2. start up jaeger
  3. click "System Architecture" report a error,like this :
    HTTP Error: plugin error:rpc error: code = Unknown desc = not implemented
    

Expected behavior

I expect System Architecture to display a DAG of the calling relationship of each system

Relevant log output

No response

Screenshot

No response

Additional context

No response

Jaeger backend version

1.47.0

SDK

OpenTelemetry javaagent 1.28.0

Pipeline

javaagent->otelcol->jaeger

Stogage backend

clickhouse

Operating system

linux

Deployment model

CLI

Deployment configs

No response

Configure release

Upload released binary with example config to Github Release page.

[Feature]: Use native ClickHouse interface instead of database/sql

Requirement

As an active jaeger-clickhouse user I'd like to suggest to use the native ClickHouse communication protocol instead of database/sql-compatible one. This change might significantly increase the overall performance and speed up the spans writer.

Problem

jaeger-clickhouse uses the clickhouse-go 1.5.4 client. It provides the standard database/sql interface for communication with ClickHouse.

There is a benchmark section in the readme of the repository. It claims that migration to v2 might significantly speed up write and read. This speed up is possible due to usage of the native TCP ClickHouse client-server protocol. Furthermore, new versions (>= 2.3.0) use ch-go for compression.

Proposal

Switch to the newer version of the clickhouse-go client and enjoy the better performance.

Maybe it's even possible to switch to ch-go, but I think the library may not support all the used high-level features of clickhouse-go at the moment.

Open questions

I have two questions in mind:

  • Is it possible to switch to the native ClickHouse TCP protocol without breaking compatibility?
  • Is it worth it? We need to create a benchmark that compares two protocols on specific to jaeger-clickhouse queries.

Do you, folks, see any pitfalls?

Hardcoded jaeger path

Make considers path to jaeger all-in-one as ${HOME}/projects/jaegertracing/jaeger/cmd/all-in-one/all-in-one-linux-amd64, maybe it's better to make environmental variable for this

[Feature]: Allow changing TTL configuration on existing tables

Requirement

As a Jaeger Operator
I want to be able to modify the TTL configuration of my tables/databases
So that I can change these settings after the initial database creation

Problem

Currently, TTL is set ONLY on database creation.
A change on TTL config values, after database creation, will not get propagated to the ddbb nor tables

Proposal

We can add sqlscripts to perform the TTL adjustment independenty from ddbb creation.

For the spans table, this new script will look similar to

ALTER TABLE {{.SpansTable}}
    MODIFY {{.TTLTimestamp}}

and we should make sure we run this script AFTER the one that creates the table, so it wont fail on new installs.

Open questions

No response

Make maxSpanCount of WriteWorkerPool a parameter

Problem

Amount of spans that can be recorded at a time is a costant, so not flexible at all.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Make that amount a parameter in config.

Add unit tests

The repository does not have any unit tests. It would be great to add tests to increase coverage. E.g. Jaeger uses 95% test coverage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.