Comments (31)
This is likely when a connection is kept-alive and re-used - I think it would result in the connection closing (potentially during usage). It is hard to know what might cause this though, are there any clues in the requests arriving before these errors?
from hypercorn.
It would be helpful to know the value of self.connection.states
when the error happens.
from hypercorn.
I don't think I'm losing any data but the log has a significant number of these exceptions. Is there any existing log I can enable that might help?
from hypercorn.
I've made a temporary (local) patch to the underlying h11 module and the exception's happened twice in the last few minutes. The client/server states (at the time of failure) are: -
self.states[CLIENT]=SEND_BODY
self.states[SERVER]=DONE
The CLIENT
state of SEND_BODY
is the one upsetting h11.
from hypercorn.
Ah, the issue here is that Hypercorn will happily send the response if instructed to do so before the request body has been full received. This is allowed in RFC 7230 (section 6.5) if the response is an error code, but rarely implemented.
I'll take a look at other servers, I'm uncertain whether the server or application should do the waiting for the body. It may be something that should be specified in ASGI.
from hypercorn.
Definitely do look at other servers to see what the state of the art is. But I think the way this supposed to work is:
- The server is free to send a response at any time; it doesn't have to read the request first
- But it does have to read the request body before it can reuse the connection for another request (keepalive mode). So if there ASGI app sends an early response and then terminates, the ASGI server has either read and discard the response body, or else go ahead and close the connection. Since you probably don't want to read potentially unbounded amounts of data, you probably want to close the connection.
- Some clients don't start reading the response until after they've finished sending the request. If you have a client like this, and the request and the response bodies are both large, and you send an early response, you could potentially deadlock. There's not really anything an ASGI server can do to prevent this in general; the app has to be aware of the possibility and cope (e.g. by setting timeouts).
- On an early response, the client can potentially lose the response if they're still sending when you close the connection.
- Things that can mitigate these issues:
- Having good support for
Expect: 100-continue
. Does ASGI support this? - If the server is ambitious, it can improve the chances of client seeing an early response by doing a soft close: instead of immediately closing the connection, it does
shutdown(SHUT_WR)
, and then reads and discards data up to some limit (N bytes or M seconds?), in the hopes that the client will notice the response and the closed connection, and stop sending gracefully. Of course if it doesn't then you have to close the socket and cut it off.
- Having good support for
from hypercorn.
Maybe useful for intuition: the classic case of an early close is something like:
PUT /blah/blah HTTP/1.1
Content-Length: 999999999999
HTTP/1.1 403 Quota exceeded
You gotta cut this person off before they send you a terabyte of data. No way do you want to read all that in before telling them that it's too big. (Or like, if they need to give their password first, you don't want to transfer a terabyte and then tell them they need to log in and then upload it again.)
This is also the case that motivated the invention of 100 Continue
– it lets a well-behaved client ask whether it should even try to upload the request body before it starts sending stuff. But not all clients support 100 Continue
and a server still needs to be able to cope with this situation.
from hypercorn.
In my particular case the client sends nothing to the server once connected, it just sits on a recv()
call collecting messages sent to it.
The server on the other hand passes small messages (of the order hundreds of bytes) to the client every 10 minutes. It's bursty, with up to 1,000 small messages being sent to each connected client at 10-minute intervals.
from hypercorn.
In my particular case the client sends nothing to the server once connected,
Hmm. If there is no request body and the server doesn't read it, then h11 may have all the request body events buffered up internally, and just be complaining that you're trying to start the next cycle without having consumed them.
hypercorn might want to call next_event
until exhausted, before calling start_next_cycle
, just to flush out anything like this. (Unless it does already.)
from hypercorn.
@njsmith ASGI doesn't have a concept of 100-continue and Hypercorn calls next_event
until exhausted whenever data is received.
@alanbchristie What framework are you using - any chance it is over counting the content-length?
I've pushed 4901eed which will early close, as a workaround.
from hypercorn.
The server (a mock of a more complex Java app I use) can be found at https://gitlab.com/matilda.peak/chronicler-stub.git. It's one file (chronicler.py
) and exposes the web-socket via the URI ws://<blah>/channel/<N>
. It's a quart app. Once a connection is made it asserts that the connecting header contains a special key and then runs a loop that tries to read from a queue
either sending a message from that queue or sleeping for a short period (both using await
). It essentially transmits data to the connected client, and does not expect anything in return.
The Python client (currently a private repo) uses websockets 5.0
and essentially connects to the URI with: -
async with websockets.connect(self._ws_uri, extra_headers=connect_hdr) as web_socket:
And then enters a loop receiving from the websocket: -
message = await web_socket.recv()
The client/server appear to work as expected. The original data is written to chronicler.py
through REST calls to its /inbound
route API, placed into a channel-specific queue and passed to the client via the call to send()
.
- The messages are generally small (hundreds of bytes, maybe 1 or 2K at most).
- All the apps run as container images in Kubernetes on an embedded ARM device using a
arm32v7/python:3.7.3-alpine3.9
base image.
from hypercorn.
ASGI doesn't have a concept of 100-continue
Interesting! When you get a request with Expect: 100-continue
, do you (1) automatically send 100 Continue immediately, (2) wait until the user calls await receive()
, (3) ignore it entirely, (4) something else?
In principle (2) is probably optimal, but it has some user visible effect. In particular, it means that if you send an early response, you might not be able to read the request body at all. But I guess that's not something that makes sense to allow anyway (the client may well disappear after your early response anyway). So probably ASGI should formally forbid calling receive
after calling send
.
ASGI should have a way for the app to send arbitrary 1xx responses, but IIRC it doesn't. Adding that would also complicate 100 Continue support, at least in that you'd have to specify how they interact...
and Hypercorn calls next_event until exhausted whenever data is received.
Interesting! That's not how I was imagining it would work, but if you're not relying on checking the h11 state attributes then I guess it works fine.
from hypercorn.
@alanbchristie these logs would be coming from REST calls where your app doesn't consume all the data before sending a response. Maybe requests with failed authentication or something?
from hypercorn.
The exceptions are in the chronicler-stub
container that's receiving REST data on its /inbound
handler and where data's shunted to the connected web-sockets, through connection-specific queues to the /channel/<N>
handler. Each web-socket, when established, sits in the while True
loop getting an event from its queue then sending or sleeping.: -
if event_message:
await websocket.send(event_message.SerializeToString())
else:
await asyncio.sleep(4)
Once connected, the process on the other end of the web-socket is expected to be permanent, i.e. the other end of the socket is not expected to close the connection or stop calling recv()
.
Here's a fragment of chronicler-stub
log. There's a burst of inbound messages at 07:50
(I don't log all the messages, just every 100th). You can see the exception occurs while 100 messages are being transmitted between 07:50:38
and 07:50:47
: -
It's important to understand that there are 2 web-sockets that are active - two recipients have connected - one to /channel/1
and one to /channel/2
. So there are two independent queues and the 100 messages (100 in each queue) are sent to their respective web-sockets during this time.
2019-04-16T07:50:38.781Z chronicler INFO # ---- inbound
2019-04-16T07:50:38.782Z chronicler INFO # websockets: 2
2019-04-16T07:50:38.784Z chronicler INFO # 1100 in, 2200 out, 0 dropped
2019-04-16T07:50:38.786Z chronicler INFO # Current message: {'axis': '178', 'name': 'Warwick Square, Pimlico', 'bikes': '2', 'spaces': '17', 'docks': '19', 'working docks': '19', 'broken docks': '0', 'time': '2019-04-16T07:46:44Z'}
2019-04-16T07:50:46.831Z asyncio ERROR # Exception in callback H11Server.recycle_or_close(<Task finishe...> result=None>)
handle: <Handle H11Server.recycle_or_close(<Task finishe...> result=None>)>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.7/site-packages/hypercorn/asyncio/h11.py", line 103, in recycle_or_close
self.connection.start_next_cycle()
File "/usr/local/lib/python3.7/site-packages/h11/_connection.py", line 204, in start_next_cycle
self._cstate.start_next_cycle()
File "/usr/local/lib/python3.7/site-packages/h11/_state.py", line 298, in start_next_cycle
raise LocalProtocolError("not in a reusable state")
h11._util.LocalProtocolError: not in a reusable state
2019-04-16T07:50:47.083Z chronicler INFO # ---- inbound
2019-04-16T07:50:47.084Z chronicler INFO # websockets: 2
2019-04-16T07:50:47.086Z chronicler INFO # 1200 in, 2400 out, 0 dropped
2019-04-16T07:50:47.087Z chronicler INFO # Current message: {'axis': '352', 'name': 'Vauxhall Street, Vauxhall', 'bikes': '15', 'spaces': '4', 'docks'
from hypercorn.
@alanbchristie could you take a look at this with the latest master? I think it is more robust now (I've finished a fairly large refactor).
from hypercorn.
Will do. I should have some time this Fri/Sa.
from hypercorn.
I've embedded the latest master
into my app, which has been running for about 90 minutes. It's processed 4,200 messages, passing 1,987 through to the connected websocket. That's all as expected so everything's fine from that perspective.
And...
There are no examples of the original fault so far. :-)
But ... the following has occurred twice: -
2019-06-29T12:00:47.142Z asyncio ERROR # Task exception was never retrieved
future: <Task finished coro=<worker_serve.<locals>._server_callback() done, defined at /usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/asyncio/run.py:90> exception=AttributeError("'NoneType' object has no attribute 'handle'")>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/asyncio/run.py", line 91, in _server_callback
await Server(app, loop, config, reader, writer)
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/asyncio/server.py", line 98, in run
await self._read_data()
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/asyncio/server.py", line 118, in _read_data
await self.protocol.handle(RawData(data))
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/protocol/__init__.py", line 60, in handle
return await self.protocol.handle(event)
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/protocol/h11.py", line 106, in handle
await self._handle_events()
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0-py3.7.egg/hypercorn/protocol/h11.py", line 154, in _handle_events
await self.stream.handle(Body(stream_id=STREAM_ID, data=event.data))
AttributeError: 'NoneType' object has no attribute 'handle'
Annoyingly I discovered that I didn't update the
__version__ = "0.6.0"
value in your__about__.py_
so it appears as though I've been using the official0.6.0
(which I haven't). Is it worth getting into the habit of changing the version number on your release branch (master) immediately after each official release? i.e. update master to0.7.0-alpha
or some-such now to avoid this confusion.
from hypercorn.
Thank you! I think I've fixed this new error now, with f6d6595. Could you check with that (I've bumped to 0.6.0+dev). It also raises this issue, which I'll look into fixing next.
from hypercorn.
Rebuilt with Hypercorn==0.6.0+dev
but still seeing...
2019-06-30T16:40:34.096Z asyncio ERROR # Task exception was never retrieved
future: <Task finished coro=<worker_serve.<locals>._server_callback() done, defined at /usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/asyncio/run.py:91> exception=AttributeError("'NoneType' object has no attribute 'handle'")>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/asyncio/run.py", line 92, in _server_callback
await Server(app, loop, config, reader, writer)
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/asyncio/server.py", line 97, in run
await self._read_data()
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/asyncio/server.py", line 117, in _read_data
await self.protocol.handle(RawData(data))
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/protocol/__init__.py", line 60, in handle
return await self.protocol.handle(event)
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/protocol/h11.py", line 106, in handle
await self._handle_events()
File "/usr/local/lib/python3.7/site-packages/Hypercorn-0.6.0+dev-py3.7.egg/hypercorn/protocol/h11.py", line 156, in _handle_events
await self.stream.handle(Body(stream_id=STREAM_ID, data=event.data))
AttributeError: 'NoneType' object has no attribute 'handle'
Just started the app, seen one so far.
from hypercorn.
After a 5-hour run with 15,420 messages passed to 2 web-sockets I've seen three of the AttributeError
exceptions. That's with the app running in a python:3.7.3-alpine3.10
container.
from hypercorn.
Thanks again for testing, it is very helpful. That said I'm not sure what is causing the AttributeErrors, I'll have to think again.
from hypercorn.
No problem. I don't think I'm loosing any messages but if there's more debug you need to add I can re-run it at some point but I'm not an asyncio
expert (yet).
from hypercorn.
As the error occurs when trying to send a Body
event rather than a Data
event, I think this is related to HTTP requests. Does your setup also handle HTTP requests (aside from the WebSocket ones)? If so I think it might be related to recycling the connection, and I've just fixed a bug which may solve this problem.
from hypercorn.
Actually the above would require the client to be using pipelining, which seems unlikely. Must be something else.
from hypercorn.
I think I've got this in 13bbb7f, it is a question of Hypercorn responding before the client has finished (which isn't a problem as long as Hypercorn doesn't error as the client finishes). Could you try with the latest master?
from hypercorn.
I will try again tonight...
from hypercorn.
Sorry - have not been to do this tonight. All sorts of trouble accessing GitLab registry. This will have to wait until the weekend when I have time to understand what's going on with my registry tokens.
Are the changes in 0.7.0
?
from hypercorn.
They are. Thanks again for trying.
from hypercorn.
OK - so the new build's been running from the latest master for about 90 minutes now with 2,700 messages processed an no exceptions. I think we can declare this one fixed?
I'll rebuild my with the official release over the weekend.
Neat.
from hypercorn.
...and have been running with 0.7.0
for most of the day, 28776 messages and no exceptions. Yep - we can close this defect now. Thanks.
from hypercorn.
Excellent, thank you for the continual testing.
from hypercorn.
Related Issues (20)
- FastAPI deployed with hypercorn in GCP Cloud Run returning 503 sporadically HOT 6
- Entire hypercorn server crashes when receiving trailing data on a websocket upgrade request HOT 1
- Occassionally with HTTP2, server does not send "End Stream" flag as True when all the data has been sent in response
- Can the ASGI application be specified in the config file?
- How do I log username in access log? HOT 1
- Getting permission Error [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions when running FastAPI in IIS
- Need clarification on trusted hops HOT 1
- Reload up to date SSL certificates without restarting server?
- pytest-cov / pytest-sugar not included in pyproject.toml
- TCP server keep-alive times out even though data is being received. HOT 2
- Can we please get a new release? HOT 1
- ProxyFixMiddleware throws exception when X-Forwarded-Host is present HOT 1
- Unexpected "shutdown without Lifespan support" error on Hypercorn 0.17.3 HOT 1
- Cannot start WSGI application using Dispatcher middleware to support both FastAPI and Flash app at the same time. HOT 1
- It seems hypercorn is not closing connections properly HOT 12
- ProxyFixMiddleware: Turns out GCP LBs and possibly AWS LBs only have one value in `x-forwarded-proto` HOT 1
- Opentelemetry Tracing + Logging Issue
- error `hypercorn --config python:hypercorn_config.config` HOT 1
- Hypercorn processes daemon property fails multiprocessing.Process / concurrent.futures.ProcessPoolExecutor HOT 1
- Type hint for check_multiprocess_shutdown_event is confusing for mypy
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hypercorn.