Coder Social home page Coder Social logo

Comments (3)

jan-auer avatar jan-auer commented on June 18, 2024

Can you please share the following information:

  • Which setup are you running Relay and Sentry in? Is this running in our official onpremise setup, or a custom one?
  • Which mode is Relay running in?
  • Can you share the full config.yml file?

The specific request goes to the Sentry web API, which is configured with the relay.upstream field. "Server disconnected" means that the web workers did not respond and closed the connection. This happens in one of two cases:

  1. The worker crashes and has to restart
  2. The connection between Relay and web workers got into an invalid state

In both cases, Relay does reconnect, and it retries these requests without dropping data. As long as these errors are not occurring persistently or with high volume, it is absolutely safe to ignore them, the error messages are just verbose.

from relay.

drook avatar drook commented on June 18, 2024
  1. it's a custom one, because I decontainerized all of the structural services, like noSQLs/databases for easier administration (to be honest, I'm pretty happy about it).
  2. not use about the mode, I'm just running it as /usr/local/relay/bin/relay run -c /usr/local/etc/relay inside an environment that points to its config (below)
  3. sure:
---
relay:
  upstream: "http://10.100.33.6:9000/"
  host: 10.100.33.6
  port: 3000
logging:
  level: INFO
  format: simplified
  log_failed_payloads: true
  enabled: true
processing:
  enabled: true
  kafka_config:
    - {name: "bootstrap.servers", value: "10.100.33.7:9092"}
  redis: redis://10.100.33.5:6379

It occured to me indeed that this could be related to the sentry celery workers that crashed. So I took a loot at sentry logs around the time of event and found this:

Sep 17 03:24:29 jsentry sentry[43595]: Traceback (most recent call last):
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/work
er/consumer/consumer.py", line 316, in start
Sep 17 03:24:29 jsentry sentry[43595]:     blueprint.start(self)
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/boot
steps.py", line 119, in start
Sep 17 03:24:29 jsentry sentry[43595]:     step.start(parent)
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/work
er/consumer/consumer.py", line 592, in start
Sep 17 03:24:29 jsentry sentry[43595]:     c.loop(*c.loop_args())
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/celery-4.1.1-py2.7.egg/celery/work
er/loops.py", line 91, in asynloop
Sep 17 03:24:29 jsentry sentry[43595]:     next(loop)
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
asynchronous/hub.py", line 276, in create_loop
Sep 17 03:24:29 jsentry sentry[43595]:     tick_callback()
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 1040, in on_poll_start
Sep 17 03:24:29 jsentry sentry[43595]:     cycle_poll_start()
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 322, in on_poll_start
Sep 17 03:24:29 jsentry sentry[43595]:     self._register_BRPOP(channel)
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 308, in _register_BRPOP
Sep 17 03:24:29 jsentry sentry[43595]:     channel._brpop_start()
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/kombu-4.2.2.post1-py2.7.egg/kombu/
transport/redis.py", line 714, in _brpop_start
Sep 17 03:24:29 jsentry sentry[43595]:     self.client.connection.send_command('BRPOP', *keys)
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/redis-2.10.6-py2.7.egg/redis/connection.py", line 610, in send_command
Sep 17 03:24:29 jsentry sentry[43595]:     self.send_packed_command(self.pack_command(*args))
Sep 17 03:24:29 jsentry sentry[43595]:   File "/usr/local/lib/python2.7/site-packages/redis-2.10.6-py2.7.egg/redis/connection.py", line 603, in send_packed_command
Sep 17 03:24:29 jsentry sentry[43595]:     (errno, errmsg))
Sep 17 03:24:29 jsentry sentry[43595]: ConnectionError: Error 32 while writing to socket. Broken pipe.
Sep 17 03:24:29 jsentry sentry[43595]: 00:24:29 [WARNING] celery.worker.consumer.consumer: consumer: Connection to broker lost. Trying to re-establish the connection...
Sep 17 03:24:29 jsentry sentry[43595]: Restoring 3 unacknowledged message(s)
Sep 17 03:24:32 jsentry sentry[43595]: %4|1600302272.030|CONFWARN|rdkafka#producer-1| [thrd:app]: Configuration property
 produce.offset.report is deprecated: No longer used.
Sep 17 03:24:32 jsentry sentry[43595]: %4|1600302272.031|CONFWARN|rdkafka#producer-2| [thrd:app]: Configuration property
 produce.offset.report is deprecated: No longer used.

Both relay and celery complaints aren't repeating ones, - one occurence of each, then event streaming stops. From what I can see - yeah, there's a problem with celery workers, which stop periodically, and this is somehow related to redis - however the most discouraging thing is that redis itself works perfectly - no restarts, no errors or complaints in it's log. Furthermore, this event stucking issue happens only one per 3-4 days, however, I can say, celery workers die more frequently. Looks like at one of these celery workers restarts relay sometimes clashes with it and the whole chain just dies.

Currently I have no idea what to do about this, because the host environment doesn't show any errors or tresholds exceeded.
If it matters, this sentry installation handles pretty intensive (for a commercial plan) load, about 175K per ordinary day, that's why we switched to an on-premise one.

from relay.

jan-auer avatar jan-auer commented on June 18, 2024

It looks like one of the other services stops working and is not restarted properly. The connection error above looks promising, although it is hard to judge if this is actually the culprit and whether one of the Sentry workers or one of the services is at fault here.

This is a shot in the dark, but a quick search revealed this issue comment and similar issues. This could be a celery bug, or timeouts in Redis when too many tasks are scheduled.

Since this issue seems to originate outside of Relay, I will have to close this here. I can recommend to use the official onpremise setup, as it will be easier for us to help with issues like the one above. For now, you could consider one of these options:

  1. If it turns out that one of the Sentry services is causing this problem, open an issue in the sentry issue tracker.
  2. If another service is misbehaving, have a look at our onpremise repo for reference settings and configuration.

Finally, I would invite you to reach out to reach out to us. Your numbers indicate that you would need around 5M errors per month, which we would love to make an adequate offer for.

from relay.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.