Comments (19)
Thank you, this in fact points out the problem: transfer-encoding: chunked
. The sentry web container is not able to handle this, and we should not be creating such a request. We'll submit a bugfix for this, thank you for uncovering it. (Edit: #890).
The configuration option you're setting, _client: reqwest
, is unstable and still being developed. I'm curious what @untitaker wanted to test with this. In the meanwhile, I would suggest to switch back to the regular client.
As this is likely an issue with the way the containers are being run or networking is set up, I would propose to move this issue to getsentry/onpremise
. Any objections, @BYK?
from relay.
Can you try adding
http:
_client: reqwest
to your config?
from relay.
Thanks for you response. After that change, things got bad turn:
2020-12-07T22:15:14Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-07T22:15:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2020-12-07T22:15:31Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-07T22:15:31Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
from relay.
Cool, that's worse. Do you have capability to inspect the response body of sentry to relay? I wonder why the request is invalid.
from relay.
Context: We intend to switch to reqwest as HTTP client (the option to switch back/forth is temporary), so I think it's wasted time to investigate bugs in the old client.
from relay.
Do you have capability to inspect the response body of sentry to relay?
Probably, but right now I have no clue. I would have to look the at the code to see if I could trap the request/response or put DEBUG=True in the webserver: Would it help?
Changing relay's logging level to DEBUG didn't helped much, though.
from relay.
I'm looking at the code of the upstream actor, and I noticed I've never spotted the info log at:
log::info!("relay successfully registered with upstream");
slf.auth_state = AuthState::Registered;
slf.auth_backoff.reset();
But I've received events from clients, so... what's happening here? Am I missing something?
from relay.
Well I'm not getting events from clients right. I'm triggering events in some the applications and I don't see them appear in Sentry :*(.
Still the logs just show:
2020-12-08T00:17:19Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-08T00:17:19Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: Connection reset by peer (os error 104)
2020-12-08T00:17:43Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2020-12-08T00:17:43Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
from relay.
Hum... I ran docker pull
for all the images snuba
, rely
, sentry
and symbolicator
. Now I'm getting these logs:
2020-12-08T01:33:42Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-08T01:33:42Z [relay_server::actors::upstream] INFO: relay successfully registered with upstream
2020-12-08T01:33:42Z [relay_server::actors::project_upstream] ERROR: error fetching project states: could not send request to upstream
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: Connection reset by peer (os error 104)
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
from relay.
I tried once more to use reqwest
, but it failed again with 400 Bad request. But I wasn't able to catch the request to why sentry's complaining about it. And now I'm back to the cycle of ERROR: authentication encountered error
...
I'll report back tomorrow if I find anything.
Best regards.
from relay.
@mvaled I have opened #878 for the 400 bug, let me know if that fixes the issue
from relay.
As commented there the issue still happens. As soon as I switch to reqwest
, I get the 400 error. This is the full log for the first seconds after restarting relay with reqwest
:
2020-12-17T12:31:43Z [relay::setup] INFO: launching relay from config folder /work/.relay
2020-12-17T12:31:43Z [relay::setup] INFO: relay mode: managed
2020-12-17T12:31:43Z [relay::setup] INFO: relay id: <redacted>
2020-12-17T12:31:43Z [relay::setup] INFO: public key: <redacted>
2020-12-17T12:31:43Z [relay::setup] INFO: log level: INFO
2020-12-17T12:31:43Z [relay_server::actors::outcome::processing] INFO: OutcomeProducer started.
2020-12-17T12:31:43Z [relay_server::actors::events] INFO: starting 8 event processing workers
2020-12-17T12:31:43Z [relay_server::actors::upstream] INFO: upstream relay started
2020-12-17T12:31:43Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-17T12:31:43Z [relay_server::actors::store] INFO: store forwarder started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::service] INFO: spawning http server
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::service] INFO: listening on: http://0.0.0.0:3000/
2020-12-17T12:31:43Z [actix_net::server::server] INFO: Starting 8 workers
2020-12-17T12:31:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2020-12-17T12:31:43Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-17T12:31:43Z [actix_net::server::server] INFO: Starting server on 0.0.0.0:3000
2020-12-17T12:31:43Z [relay_server::actors::controller] INFO: relay server starting
2020-12-17T12:31:43Z [relay_server::actors::connector] INFO: metered connector started
2020-12-17T12:31:43Z [relay_server::actors::events] INFO: event manager started
2020-12-17T12:31:43Z [relay_server::actors::project_local] INFO: project local cache started
2020-12-17T12:31:43Z [relay_server::actors::project_upstream] INFO: project upstream cache started
2020-12-17T12:31:43Z [relay_server::actors::project_cache] INFO: project cache started
2020-12-17T12:31:43Z [relay_server::actors::relays] INFO: key cache started
2020-12-17T12:31:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2020-12-17T12:31:44Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-17T12:31:44Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
from relay.
Fair enough, we merged this because it fixed the issue for another customer. I'm not really sure what to try then
Probably, but right now I have no clue. I would have to look the at the code to see if I could trap the request/response or put DEBUG=True in the webserver: Would it help?
assuming you're sure you're running the correct commit, let's try request capturing: I was thinking more of using wireshark assuming you have unencrypted traffic.
from relay.
Hi @untitaker,
I'm rebuilding the image from master:
$ git log -1 --oneline
33f0dfbb feat: Enable use of multiple value types per field (#882)
$ build docker . -t getsentry/relay:nightly
So, I'll be running relay with the right commit.
As for using wireshark, I'm not really sure if I'm allowed. Let's find out.
from relay.
Unfortunately, using wireshark is not possible. I don't have physical access to the server and X11 forwarding via SSH is not allowed :(
from relay.
I'm gonna use tcpdump
(or something like that) in the server and import it in wireshark locally.
from relay.
Using tcpdump
I could sniff the traffic going to the upstream. Once with actix, and once with reqwest. I think I have spotted a difference:
On the left is the actix trace, on the right is the reqwest. The JSON payload using reqwest is Array (seemingly empty) instead of an object.
from relay.
Looking again at the traces I noticed my previous comment was a response coming from the server, not the POST send by the relay. So I looked again.
In the good trace, I observe the following:
Relay sends a POST to upstream with a JSON payload {relay_id: ..., public_key: ..., version: ...}
. There are two custom HTTP headers: x-sentry-relay-id: bcfd9d5a-...
which matches the relay_id
in the payload; and x-sentry-relay-signature
, which I assume is a signature of the HTTP request with headers included. Other headers are:
content-type: application/json
accept-encoding: gzip, deflate
user-agent: actix-web/0.7.19
content-length: 130
date: Fri, 18 Dec 2020 16:11:52 GMT
The upstream returns 200, with a JSON payload {relay_id: ..., token: ...}
In the "bad" trace:
A similar POST is done. The JSON payload is the same as before. But some HTTP headers are different:
content-type: application/json
content-encoding: identity
accept: */*
accept-encoding: gzip
transfer-encoding: chunked
I hope this may provide some clues about what's happening.
from relay.
Closing since both self-hosted and Relay have changed substantially. If a similar bug is still occurring, please re-submit to the self-hosted repository. Thanks!
from relay.
Related Issues (20)
- Dose Sentry only run in UTC ? HOT 1
- [INC-579] Integration test for suspended quotas
- Document Relay API endpoints HOT 2
- Configure zstd as default bucket encoding. HOT 1
- [Flaky CI]: test_global_metrics_with_processing
- [Flaky CI]: test_dynamic_sampling
- Enforce limit of 200 chars per MRI
- Map Valid Span Op names instead of default HOT 6
- Parse & scrub lowercase redis commands
- No scoping due to missing project ID HOT 2
- Implement AI dynamic sampling bias HOT 1
- Create feature flag to control the new bias
- Create new AI bias
- Add new `any` rule type to match on a collection of elements HOT 1
- Rate Limit indexed payloads only once
- [Flaky CI]: test_rate_limit_metrics_consistent HOT 1
- [Flaky CI]: test_global_rate_limit_by_namespace
- Cardinality limits are not transactional
- Rate limiting is skipped for metrics when there was a cached rate limit.
- output what feature shoule enable in log HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from relay.