python-hyper / hyper Goto Github PK
View Code? Open in Web Editor NEWHTTP/2 for Python.
Home Page: http://hyper.rtfd.org/en/latest/
License: MIT License
HTTP/2 for Python.
Home Page: http://hyper.rtfd.org/en/latest/
License: MIT License
We should send GoAway
frames when we close a connection.
There are a few places in the code where I'm using lists to store things (e.g. HPACK header tables) that I don't care about indexing in O(n), but I do care about prepending to in O(1). These should be replaced with deques.
When we get a BLOCKED frame we should probably treat that as a hint to increase the window size, and also log at the warning level.
The hpack-test-case repository doesn't have many tests yet, and we need more: right now the future branch is skipping a whole ton of tests. See http2jp/hpack-test-case#12.
Docstrings should have the form: description, then parameters, then return value. Unfortunately, at least one has the return value too early. We need to fix that up.
In its current form, hyper
is really stupid about flow control: whenever it receives a data packet it will issue a WINDOWUPDATE frame that extends the flow control window back to its original size. This leads to a huge amount of excess computation and network overhead.
Long-term, I want hyper
to be able to make smarter choices about flow-control. In the short term, we can improve things by resizing the flow-control window in larger 'chunks' of about half the size of the window. In future it ought to be possible for hyper
to examine the Content-Length
header and determine whether to bother with stream-level flow control at all: i.e. if the Content-Length
is less than the size of the stream window, don't bother sending any WINDOWUPDATE frames.
HTTP/2 currently has a provision for opportunistically encrypting HTTP/2 on port 80. It'd be nice if we could do that too.
We're doing pretty well on test coverage, but we can do better.
============================= test session starts ==============================
platform darwin -- Python 3.3.4 -- pytest-2.5.1
plugins: cov, xdist
gw0 [526] / gw1 [526] / gw2 [526] / gw3 [526]
scheduling tests via LoadScheduling
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
--------------- coverage: platform darwin, python 3.3.4-final-0 ----------------
Name Stmts Miss Cover
----------------------------------------------------
hyper/__init__ 10 1 90%
hyper/contrib 48 3 94%
hyper/http20/__init__ 1 0 100%
hyper/http20/connection 142 10 93%
hyper/http20/frame 177 10 94%
hyper/http20/hpack 257 11 96%
hyper/http20/huffman 59 1 98%
hyper/http20/huffman_constants 5 0 100%
hyper/http20/response 63 5 92%
hyper/http20/stream 107 2 98%
hyper/http20/tls 30 8 73%
hyper/httplib_compat 43 43 0%
----------------------------------------------------
TOTAL 942 94 90%
========================= 526 passed in 29.15 seconds ==========================
It would be nice to be able to have a cffi
interface to nghttp2's HPACK implementation. That will always be faster and more memory-efficient than mine. A separate package is probably the way to go here.
The intended use-case of hyper
is the current use-case of Requests. This means we need to work in the standard single-threaded case (easy), but also in the multithreaded case. The multithreaded case is where HTTP/2.0 can provide some of the biggest benefits thanks to the use of a single connection instead of many.
Unfortunately, this means that there will be a single Connection
object and multiple Stream
objects. Each Stream
should be local to a single thread (representing as they do a single request or a single response), but the Connection
object (and the socket it uses) may be shared amongst threads. We need a solution for this workflow to not be really uncomfortable.
My suspicion right now is that we'll want to lock the socket object so that it can only be accessed from a single thread at a time. This needs to be done in a way that means that even if a massive response/request grabs the socket object, the others don't block longer than they need to on it. I'm not yet sure how this will work. Ideas are welcome.
We've got common source for 2.7 and 3.3, so we should have universal wheels.
hyper
is not optimised for performance right now. While the HTTP/2 spec is changing I want to focus on correctness and the ability to easily change behaviour.
However, when it does get nailed down we'll want to make some steps to improve performance. Roberto Peon has provided an awesome list of things to work on, which I've reproduced here. People should take things off this list and break them out into new issues as they go.
socket.read()
per frame, we should read into a buffer and then parse the buffer. Avoids repeated kernel-userspace transitions.nghttp2
's C HPACK implementation. (#60)memoryview
objects in hyper's pure-Python HPACK implementation. (#61)Other intelligent performance optimisations should be added here.
We should make it easier to provide your own certificates for verification.
Let's start this discussion right now. I'd like to have hyper
eventually be integrated into urllib3
, giving all the happy HTTP/2.0 love to users of urllib3
(and indirectly, users of Requests). What do we think hyper
needs to make this happen?
I don't expect this to happen quickly, and certainly not until hyper
is more mature, so let's have a really honest discussion about what we need.
Potentially interested parties: @shazow, @sigmavirus24, @t-8ch, @kennethreitz.
Conceptually hyper
can be split into two parts: a HTTP/2.0 client library and a 1.1-2.0 abstraction layer (in the future I may enforce that split if I extend hyper
to provide server code too).
The abstraction layer is a point of interest, and I haven't figured out how the design should work. If people want to suggest implementation ideas, by all means go ahead.
Changelog here.
Python 2.7 and 3.3 support are still blocked behind PyOpenSSL, which is blocked behind cryptography. Until those projects can get moving I'd like to keep releasing hyper
. I therefore propose one of two paths:
I've borrowed a pair of chunks of code (SocketServerThread
in the tests and DeflateDecoder
in the response) from @shazow's urllib3 project. Technically the MIT license requires that I distribute it along with that code, which I should really do. =)
As a shorter-term goal than getting into urllib3 (tracked by #18), let's try to get hyper
some exposure by making it plug into Requests. This should lead to some really interesting experiments, e.g. running Twython over HTTP/2.0.
Connections currently never remove streams from their stream dict. That's pretty stupid.
There are some TODOs and best-guesses in the ssl_compat
module that I'd like to iron out at some stage.
No specific actions here, just a general 'stay aware' note.
Draft is here. Side-by-side diff is here.
Things to change:
GOAWAY frames carry a last_stream_id
field that tells us which was the last stream that was definitely processed. I'd like to be able to automatically attempt to reconnect and resend streams that weren't processed.
It would be nice if we could initiate HTTP/1. and then upgrade to HTTP/2 if the response allows us to.
This basically requires writing a HTTP/1.1 stack: I doubt aiohttp gives me the needed control.
Spec is here. Side-by-side diff is here.
Changes to make:
Window Managers can take advantage of knowing how large the document is going to be to affect the way they send WINDOWUPDATE frames. However, right now that parameter is never set.
As seen in http2jp/hpack-test-case#13, there's a very specific edge-case behaviour where the HPACK encoder incorrectly handles repeated headers. Specifically, if a header set contains two identical headers (both key and value) that is already in the reference set, we won't emit any headers, causing the output to contain only a single instance of that repeated header.
The nicest fix here is actually likely to be to fix up #36: repeated identical headers will therefore be concatenated together and will look different to the HPACK internals.
Turns out PyOpenSSL
doesn't have an implementation of recv_into
. This somewhat defeats the point of the BufferedSocket, which attempts to minimise the number of copies of data! Right now, the h2-10 branch build is failing because of this absence.
It would be awesome if we could have an asyncio
-based backend that exposes an optional synchronous API. I have no idea if we can do this or not but I'd love it if we could, it would clean the code up tremendously.
Why the hell not?
Should be both inbound and outbound: that is, we understand server-signalled prioritisation and we can correctly signal and handle prioritisation chosen by our users.
The integration tests are great, but they get really long and rely on a ton of implementation details, which is a bit lame. We should be able to abstract this away into a set of requests and responses and assertions about what they'll contain.
HTTP/2.0 mandates that all user-agents be able to handle encoded request and response bodies. We should steal the transparent handling from @shazow, like the dirty thieves we are.
HPACK should use memoryview objects when decoding if at all possible to avoid copying memory all over the place.
Rather than port this library to asyncio
, why not have a new library that re-uses some common code from here but uses asyncio
? That allows easy support of asyncio
without limiting the access to HTTP/2.
MORE PROJECTS.
Related: #10.
Alt-Svc is another way of discovering HTTP/2 support. Again, we'll need a HTTP/1.1 stack for this.
It's been suggested that the compatibility layer might be a bad idea as a primary API, and instead worth moving to a separate location from which it can be imported (e.g. from hyper import httplib_compat as httplib
. This would allow me to have a sensible API as the primary layer.
Right now we increment the remote window size when frames are read off the wire, not when they're actually consumed. This can lead to hyper
growing massive in memory when the user isn't actually reading those frames.
We should increment the window when the user is read()
ing from the stream, not when hyper
does.
It's HTTP/2 now. This primarily affects the docs, we'll worry about interface changes later.
I'd like to do ALPN to negotiate HTTP/2 like basically everyone else does, but I can't because the standard library doesn't support it.
PyOpenSSL might be my best bet here.
We added pluggable window managers in 0.0.4. With that done, we should add a new default window manager that is much smarter about dealing with the flow control window.
Right now if hyper
hits an error it just throws exceptions. This leaves the connection in an undefined state, and is fundamentally unusable from then on. We should start handling this better. 'Better' means two things:
RST_STREAM
frames. Exceptions should still be thrown, but the connection should be OK.GOAWAY
with an appropriate error code, and then tear the connection down.We need to implement logging. HTTP/2.0 is complicated, and it would be very useful to be able to find out, for example, what frames we've sent in what order.
This involves the following stages:
The HPACK specification pre-defines a Huffman coding table for use with the HPACK layer. This includes Huffman codes for sending requests (which we can use to encode) and Huffman codes for sending responses (which we can use to decode).
Encoding data using the provided Huffman coding table isn't too challenging, we just use it as a dictionary. Decoding data is a substantially different problem. Specifically, we need a specialised pure-Python prefix trie.
Data will come off the wire in the form of a lengthy bytestring, like b'\x80@\x86\xf6h1\xcd\xb8\x7f\x84\xf7wx\xff@\x85\xf6A9\x8e\x83\x83\xce1w@\x88\xf6#X\xe6\xd7\xaa>\x9f\x88\xf4Fmi\x12\xd2qw@\x84\xf69\x11\xcf\x81\x0f'
. Some parts of this bytestring will be consumed by the HPACK decoding process, and we'll end up with something like, but eventually we'll hit a Huffman coded string literal. What we'll have in hand is some fragment of that bytestring, suppose: b'\xdb\x6d\x88\x3e\x68\xd1\xcb\x12\x25\xba\x7f'
.
What I need is a portion of a Huffman implementation that I can pass that entire bytestring to, and have it return the decoded string (in this case b'www.example.com'
. This will involve a prefix trie that works bitwise on non-octet-aligned data (bleh). So far I've not found a suitable prefix trie, though this one by @ElricL could be used as a jumping-off point.
In tls.py
we use SSLContext.set_default_verify_paths()
to get the system certificate bundle. This works fine on my OS X development box, but leads to cert verification errors on Windows.
Options are either to use a Windows-specific version as well that we know will work, or to start bundling requests' CAcerts.
There's a relative lack of testing on the Connection/Response layer as I sprinted on it over the weekend. Address this by stealing @shazow's socket-level testing idea.
When pyca/pyopenssl#86 is merged and a release is shipped with it, we'll want to update our dependency on PyOpenSSL to get NPN support in our 2.7 and 3.3 versions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.