Comments (3)
Can you please share the following information:
- Which setup are you running Relay and Sentry in? Is this running in our official onpremise setup, or a custom one?
- Which mode is Relay running in?
- Can you share the full
config.yml
file?
The specific request goes to the Sentry web API, which is configured with the relay.upstream
field. "Server disconnected" means that the web workers did not respond and closed the connection. This happens in one of two cases:
- The worker crashes and has to restart
- The connection between Relay and web workers got into an invalid state
In both cases, Relay does reconnect, and it retries these requests without dropping data. As long as these errors are not occurring persistently or with high volume, it is absolutely safe to ignore them, the error messages are just verbose.
from relay.
- it's a custom one, because I decontainerized all of the structural services, like noSQLs/databases for easier administration (to be honest, I'm pretty happy about it).
- not use about the mode, I'm just running it as /usr/local/relay/bin/relay run -c /usr/local/etc/relay inside an environment that points to its config (below)
- sure:
---
relay:
upstream: "http://10.100.33.6:9000/"
host: 10.100.33.6
port: 3000
logging:
level: INFO
format: simplified
log_failed_payloads: true
enabled: true
processing:
enabled: true
kafka_config:
- {name: "bootstrap.servers", value: "10.100.33.7:9092"}
redis: redis://10.100.33.5:6379
It occured to me indeed that this could be related to the sentry celery workers that crashed. So I took a loot at sentry logs around the time of event and found this:
Sep 17 03:24:29 jsentry sentry[43595]: Traceback (most recent call last):
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/work
er/consumer/consumer.py", line 316, in start
Sep 17 03:24:29 jsentry sentry[43595]: blueprint.start(self)
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/boot
steps.py", line 119, in start
Sep 17 03:24:29 jsentry sentry[43595]: step.start(parent)
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/work
er/consumer/consumer.py", line 592, in start
Sep 17 03:24:29 jsentry sentry[43595]: c.loop(*c.loop_args())
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/work
er/loops.py", line 91, in asynloop
Sep 17 03:24:29 jsentry sentry[43595]: next(loop)
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
asynchronous/hub.py", line 276, in create_loop
Sep 17 03:24:29 jsentry sentry[43595]: tick_callback()
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 1040, in on_poll_start
Sep 17 03:24:29 jsentry sentry[43595]: cycle_poll_start()
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 322, in on_poll_start
Sep 17 03:24:29 jsentry sentry[43595]: self._register_BRPOP(channel)
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 308, in _register_BRPOP
Sep 17 03:24:29 jsentry sentry[43595]: channel._brpop_start()
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 714, in _brpop_start
Sep 17 03:24:29 jsentry sentry[43595]: self.client.connection.send_command('BRPOP', *keys)
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/redis-2.10.6-py2.7.egg/redis/connection.py", line 610, in send_command
Sep 17 03:24:29 jsentry sentry[43595]: self.send_packed_command(self.pack_command(*args))
Sep 17 03:24:29 jsentry sentry[43595]: File "/usr/local/lib/python2.7/site-packages/redis-2.10.6-py2.7.egg/redis/connection.py", line 603, in send_packed_command
Sep 17 03:24:29 jsentry sentry[43595]: (errno, errmsg))
Sep 17 03:24:29 jsentry sentry[43595]: ConnectionError: Error 32 while writing to socket. Broken pipe.
Sep 17 03:24:29 jsentry sentry[43595]: 00:24:29 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
Sep 17 03:24:29 jsentry sentry[43595]: Restoring 3 unacknowledged message(s)
Sep 17 03:24:32 jsentry sentry[43595]: %4|1600302272.030|CONFWARN|rdkafka#producer-1| [thrd:app]: Configuration property
produce.offset.report is deprecated: No longer used.
Sep 17 03:24:32 jsentry sentry[43595]: %4|1600302272.031|CONFWARN|rdkafka#producer-2| [thrd:app]: Configuration property
produce.offset.report is deprecated: No longer used.
Both relay and celery complaints aren't repeating ones, - one occurence of each, then event streaming stops. From what I can see - yeah, there's a problem with celery workers, which stop periodically, and this is somehow related to redis - however the most discouraging thing is that redis itself works perfectly - no restarts, no errors or complaints in it's log. Furthermore, this event stucking issue happens only one per 3-4 days, however, I can say, celery workers die more frequently. Looks like at one of these celery workers restarts relay sometimes clashes with it and the whole chain just dies.
Currently I have no idea what to do about this, because the host environment doesn't show any errors or tresholds exceeded.
If it matters, this sentry installation handles pretty intensive (for a commercial plan) load, about 175K per ordinary day, that's why we switched to an on-premise one.
from relay.
It looks like one of the other services stops working and is not restarted properly. The connection error above looks promising, although it is hard to judge if this is actually the culprit and whether one of the Sentry workers or one of the services is at fault here.
This is a shot in the dark, but a quick search revealed this issue comment and similar issues. This could be a celery bug, or timeouts in Redis when too many tasks are scheduled.
Since this issue seems to originate outside of Relay, I will have to close this here. I can recommend to use the official onpremise setup, as it will be easier for us to help with issues like the one above. For now, you could consider one of these options:
- If it turns out that one of the Sentry services is causing this problem, open an issue in the sentry issue tracker.
- If another service is misbehaving, have a look at our onpremise repo for reference settings and configuration.
Finally, I would invite you to reach out to reach out to us. Your numbers indicate that you would need around 5M errors per month, which we would love to make an adequate offer for.
from relay.
Related Issues (20)
- Implement AI dynamic sampling bias HOT 1
- Create feature flag to control the new bias
- Create new AI bias
- Add new `any` rule type to match on a collection of elements HOT 1
- Rate Limit indexed payloads only once
- [Flaky CI]: test_rate_limit_metrics_consistent HOT 1
- [Flaky CI]: test_global_rate_limit_by_namespace
- Cardinality limits are not transactional
- Rate limiting is skipped for metrics when there was a cached rate limit.
- output what feature shoule enable in log HOT 1
- Support data scrubbing for ViewHierarchy attachments
- Add custom-metrics-ingestion-disabled option
- Multi-valued metric tags not supported HOT 4
- Update (Billing) Consumers to emit accepted outcomes for indexed transactions. HOT 1
- Produce outcomes for spans in transactions HOT 1
- Remove `organizations:transaction-metrics-extraction` feature flag
- Extrapolation for extracted metrics
- Can't change the error level from an Unreal crash reporter event HOT 1
- docker image for latest tag is not multiarch HOT 3
- Sentry checks for Relay deployments
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from relay.