Comments (5)
Hello @Tokoy
We don't ingest line by line. We have a timeout based microbatching. I believe the timeout is 0.1s.
We use distributed table in order to make fingerprint based distribution. It's important for clustering.
Feel free to try using our binary writer https://github.com/metrico/otel-collector . It's more suitable for high load writes.
from qryn.
@akvlad From my observations 20-30 instances of qryn write to clickhouse is much more efficient than otel collector.
for some reason scaling collectors doesn't give such a boost for my case.
from qryn.
Hello @Tokoy We don't ingest line by line. We have a timeout based microbatching. I believe the timeout is 0.1s. We use distributed table in order to make fingerprint based distribution. It's important for clustering. Feel free to try using our binary writer https://github.com/metrico/otel-collector . It's more suitable for high load writes.
I achieve the distribution of fingerprints by using SLB or Nginx for even distribution, current rate is writing 500k records to the each local table per second.The distributed table itself is recommended for read operations only. If frequent write operations are performed, it can easily result in the related table being in a readonly state, leading to write failures.
Additionally, I have noticed that the main query for the time_series_dist table performs a full table scan each time. If the time_series table is very large, the retrieval time will be long and it will consume a significant amount of memory. Here is my query statement:
WITH sel_a AS ( SELECT samples.string AS string, samples.fingerprint AS fingerprint, samples.timestamp_ns AS timestamp_ns FROM cloki.samples_v3_dist AS samples WHERE ((samples.timestamp_ns >= 1704440225031000000) AND (samples.timestamp_ns <= 1704440525031000000)) AND (samples.fingerprint IN ( SELECT sel_1.fingerprint FROM ( SELECT fingerprint FROM cloki.time_series_gin WHERE (key = 'app') AND (val = 'gate-v1') ) AS sel_1 ANY INNER JOIN ( SELECT fingerprint FROM cloki.time_series_gin WHERE (key = 'cluster') AND (val = 'eu-prod') ) AS sel_2 ON sel_1.fingerprint = sel_2.fingerprint )) ORDER BY timestamp_ns DESC LIMIT 100 ) SELECT JSONExtractKeysAndValues(time_series.labels, 'String') AS labels, sel_a.* FROM sel_a ANY LEFT JOIN cloki.time_series_dist AS time_series ON sel_a.fingerprint = time_series.fingerprint ORDER BY labels DESC, timestamp_ns DESC
from qryn.
@Tokoy I'm not sure I understand how to achieve even distribution of fingerprints using nginx.
Every request may have fingerprints for multiple servers.
The recommended number of fingerprints is 1M per shard. You may start experiencing OOM errors above the number.
Of course it depends on your clickhouse server RAM.
from qryn.
Closing as stale. Please reopen anytime if needed or useful.
from qryn.
Related Issues (20)
- Pyroscope qryn cant select by multiple labels HOT 2
- Error (memory access out of bounds). Please check the server logs for more details. HOT 11
- Pyroscope ProfilesTypes returns wrong format
- Post to /telegraf api produce an error HOT 3
- Post to /influxapi produce an error HOT 15
- Broken json filters HOT 10
- Pyroscope metrics group by doesnt work
- Pyroscope flame graph error "Cannot read properties of: undefined (reading 'fields')" HOT 1
- Feature Request: Trino support HOT 1
- Error with npm in qryn 3.2.9 HOT 2
- Booleans parsed from environment in utterly confusing ways HOT 3
- Tempo request raises out of bound error HOT 1
- Tempo request raises out of bound error HOT 2
- The use of JSON struct in clickhouse results in high storage space consumption HOT 15
- TypeError: undefined is not an object (evaluating 'clickhouseOptions.queryOptions') HOT 3
- Error "Bad Request: invalid request" with large body request to qryn HOT 3
- Out of bound error in /api/v1/query_range HOT 4
- Feature: Query Log Volume API
- Feature: Patterns Detection API
- [bug] RangeError: Too many properties to enumerate HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qryn.