python-hyper / hyper Goto Github PK

View Code? Open in Web Editor NEW

1.0K 1.0K 188.0 11.91 MB

HTTP/2 for Python.

Home Page: http://hyper.rtfd.org/en/latest/

License: MIT License

Makefile 0.06% Python 99.82% Shell 0.12%

hyper's People

Contributors

Stargazers

Watchers

Forkers

sriram137 rayleyva sigmavirus24 lifuzu pombredanne alekstorm ami-gs rbtcollins collinanderson b3nelof0n t2y bobuss teehamaral zofuthan jdecuyper bertothunder willingc wong2 janusnic extremeunix citysir jimcarreer masaori335 direvius avocadorice gbossert itsmemattchung e42s irvind tbetbetbe newsoft mobify jmuk kriechi boyxuper mixianghang irachex wflk chinome wangmengcn yalog jzeng07 yuvadm avivc sevajide matangover tsenzhi a320321wb orcudy jrajesh benlast jasongowthorpe beardedunicorn wjo1212 ryanwang520 mackjoner binrush nateprewitt plucury johnwheeler nicholas-gh fendouai tomas-fp g761007 meergod taeold ohyeah521 laike9m xiongyihui projekta lddias ojii adetokunbo allanlei r0fls hkwi kostyaesmukov liyunchao zs1621 mstoykov ech0-py lawnmowerlatte kolanich-libs hyxbiao nav005 leehyong pyzh ericdevries bloodevil methimpact nrg1984 ihjmh matjazp viranch apodobytko johejo deshmukhrajvardhan simonsunvip skorokithakis sdelafond

hyper's Issues

Close connections gracefully.

We should send GoAway frames when we close a connection.

There are a few places in the code where I'm using lists to store things (e.g. HPACK header tables) that I don't care about indexing in O(n), but I do care about prepending to in O(1). These should be replaced with deques.

Do something intelligent with BLOCKED frames

When we get a BLOCKED frame we should probably treat that as a hint to increase the window size, and also log at the warning level.

Update HPACK integration tests in future branch

The hpack-test-case repository doesn't have many tests yet, and we need more: right now the future branch is skipping a whole ton of tests. See http2jp/hpack-test-case#12.

Regularize docstrings

Docstrings should have the form: description, then parameters, then return value. Unfortunately, at least one has the return value too early. We need to fix that up.

Improve flow-control management.

In its current form, hyper is really stupid about flow control: whenever it receives a data packet it will issue a WINDOWUPDATE frame that extends the flow control window back to its original size. This leads to a huge amount of excess computation and network overhead.

Long-term, I want hyper to be able to make smarter choices about flow-control. In the short term, we can improve things by resizing the flow-control window in larger 'chunks' of about half the size of the window. In future it ought to be possible for hyper to examine the Content-Length header and determine whether to bother with stream-level flow control at all: i.e. if the Content-Length is less than the size of the stream window, don't bother sending any WINDOWUPDATE frames.

Support opportunistic encryption

HTTP/2 currently has a provision for opportunistically encrypting HTTP/2 on port 80. It'd be nice if we could do that too.

Connections should be context managers.

Reach 100% test coverage

We're doing pretty well on test coverage, but we can do better.

============================= test session starts ==============================
platform darwin -- Python 3.3.4 -- pytest-2.5.1
plugins: cov, xdist
gw0 [526] / gw1 [526] / gw2 [526] / gw3 [526]
scheduling tests via LoadScheduling
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
--------------- coverage: platform darwin, python 3.3.4-final-0 ----------------
Name                             Stmts   Miss  Cover
----------------------------------------------------
hyper/__init__                      10      1    90%
hyper/contrib                       48      3    94%
hyper/http20/__init__                1      0   100%
hyper/http20/connection            142     10    93%
hyper/http20/frame                 177     10    94%
hyper/http20/hpack                 257     11    96%
hyper/http20/huffman                59      1    98%
hyper/http20/huffman_constants       5      0   100%
hyper/http20/response               63      5    92%
hyper/http20/stream                107      2    98%
hyper/http20/tls                    30      8    73%
hyper/httplib_compat                43     43     0%
----------------------------------------------------
TOTAL                              942     94    90%
========================= 526 passed in 29.15 seconds ==========================

Add optional support for nghttp2's HPACK implementation

It would be nice to be able to have a cffi interface to nghttp2's HPACK implementation. That will always be faster and more memory-efficient than mine. A separate package is probably the way to go here.

Thread Safety

The intended use-case of hyper is the current use-case of Requests. This means we need to work in the standard single-threaded case (easy), but also in the multithreaded case. The multithreaded case is where HTTP/2.0 can provide some of the biggest benefits thanks to the use of a single connection instead of many.

Unfortunately, this means that there will be a single Connection object and multiple Stream objects. Each Stream should be local to a single thread (representing as they do a single request or a single response), but the Connection object (and the socket it uses) may be shared amongst threads. We need a solution for this workflow to not be really uncomfortable.

My suspicion right now is that we'll want to lock the socket object so that it can only be accessed from a single thread at a time. This needs to be done in a way that means that even if a massive response/request grabs the socket object, the others don't block longer than they need to on it. I'm not yet sure how this will work. Ideas are welcome.

Make wheel universal.

We've got common source for 2.7 and 3.3, so we should have universal wheels.

Performance Tuning

hyper is not optimised for performance right now. While the HTTP/2 spec is changing I want to focus on correctness and the ability to easily change behaviour.

However, when it does get nailed down we'll want to make some steps to improve performance. Roberto Peon has provided an awesome list of things to work on, which I've reproduced here. People should take things off this list and break them out into new issues as they go.

Rather than two calls to socket.read() per frame, we should read into a buffer and then parse the buffer. Avoids repeated kernel-userspace transitions.
Avoid copying the data around when framing. This is somewhat challenging.
Add optional support for nghttp2's C HPACK implementation. (#60)
Use memoryview objects in hyper's pure-Python HPACK implementation. (#61)
We can definitely improve the Huffman implementation. Consider using this and this for ideas. Alternatively, consider arranging the Huffman table by length an then use one bit at a time, considering only the characters of the appropriate length.

Other intelligent performance optimisations should be added here.

Better TLS

We should make it easier to provide your own certificates for verification.

Get added to urllib3.

Let's start this discussion right now. I'd like to have hyper eventually be integrated into urllib3, giving all the happy HTTP/2.0 love to users of urllib3 (and indirectly, users of Requests). What do we think hyper needs to make this happen?

I don't expect this to happen quickly, and certainly not until hyper is more mature, so let's have a really honest discussion about what we need.

Potentially interested parties: @shazow, @sigmavirus24, @t-8ch, @kennethreitz.

Connection Transparency

Conceptually hyper can be split into two parts: a HTTP/2.0 client library and a 1.1-2.0 abstraction layer (in the future I may enforce that split if I extend hyper to provide server code too).

The abstraction layer is a point of interest, and I haven't figured out how the design should work. If people want to suggest implementation ideas, by all means go ahead.

Update to h2-10

Changelog here.

Ship a h2-12 pre-release.

Python 2.7 and 3.3 support are still blocked behind PyOpenSSL, which is blocked behind cryptography. Until those projects can get moving I'd like to keep releasing hyper. I therefore propose one of two paths:

A pre-release version of 0.1.0 that supports Python 3.4 only.
A minor version, 0.0.5, that supports Python 3.4 only, adding Python 2.7 support in 0.1.0.

Include licenses for the code I've borrowed.

I've borrowed a pair of chunks of code (SocketServerThread in the tests and DeflateDecoder in the response) from @shazow's urllib3 project. Technically the MIT license requires that I distribute it along with that code, which I should really do. =)

Write a Requests adapter

As a shorter-term goal than getting into urllib3 (tracked by #18), let's try to get hyper some exposure by making it plug into Requests. This should lead to some really interesting experiments, e.g. running Twython over HTTP/2.0.

Stop holding on to closed streams forever.

Connections currently never remove streams from their stream dict. That's pretty stupid.

Iron out ssl_compat module.

There are some TODOs and best-guesses in the ssl_compat module that I'd like to iron out at some stage.

No specific actions here, just a general 'stay aware' note.

Bring hyper's HPACK implementation to draft-8.

Draft is here. Side-by-side diff is here.

Things to change:

After applying an updated value of the HTTP/2 setting SETTINGS_HEADER_TABLE_SIZE that changes the maximum size of the header table used by the encoder, the encoder MUST signal this change via an encoding context update. (Completed in e2b23b0.)
Update static table. (Completed in dc01314.)
Update Huffman code table. (Completed in 111051e.)
Update supported HPACK version to 8. (Completed in cececed.)

Automatically retry streams

GOAWAY frames carry a last_stream_id field that tells us which was the last stream that was definitely processed. I'd like to be able to automatically attempt to reconnect and resend streams that weren't processed.

Support plaintext HTTP/2 using HTTP Upgrade

It would be nice if we could initiate HTTP/1. and then upgrade to HTTP/2 if the response allows us to.

This basically requires writing a HTTP/1.1 stack: I doubt aiohttp gives me the needed control.

Bring hyper to draft-13 of HTTP/2.

Spec is here. Side-by-side diff is here.

Changes to make:

Document size is currently not set by streams.

Window Managers can take advantage of knowing how large the document is going to be to affect the way they send WINDOWUPDATE frames. However, right now that parameter is never set.

HPACK encoder doesn't handle duplicate headers appropriately

As seen in http2jp/hpack-test-case#13, there's a very specific edge-case behaviour where the HPACK encoder incorrectly handles repeated headers. Specifically, if a header set contains two identical headers (both key and value) that is already in the reference set, we won't emit any headers, causing the output to contain only a single instance of that repeated header.

The nicest fix here is actually likely to be to fix up #36: repeated identical headers will therefore be concatenated together and will look different to the HPACK internals.

Need recv_into support in PyOpenSSL

Turns out PyOpenSSL doesn't have an implementation of recv_into. This somewhat defeats the point of the BufferedSocket, which attempts to minimise the number of copies of data! Right now, the h2-10 branch build is failing because of this absence.

asyncio backend

It would be awesome if we could have an asyncio-based backend that exposes an optional synchronous API. I have no idea if we can do this or not but I'd love it if we could, it would clean the code up tremendously.

Properly implement prioritisation.

Why the hell not?

Should be both inbound and outbound: that is, we understand server-signalled prioritisation and we can correctly signal and handle prioritisation chosen by our users.

Better integration test infrastructure.

The integration tests are great, but they get really long and rely on a ton of implementation details, which is a bit lame. We should be able to abstract this away into a set of requests and responses and assertions about what they'll contain.

Implement transparent request body decoding.

HTTP/2.0 mandates that all user-agents be able to handle encoded request and response bodies. We should steal the transparent handling from @shazow, like the dirty thieves we are.

Use memoryviews in HPACK

HPACK should use memoryview objects when decoding if at all possible to avoid copying memory all over the place.

Additional asyncio version

Rather than port this library to asyncio, why not have a new library that re-uses some common code from here but uses asyncio? That allows easy support of asyncio without limiting the access to HTTP/2.

MORE PROJECTS.

Related: #10.

Make responses context managers.

Support Alt-Svc

Alt-Svc is another way of discovering HTTP/2 support. Again, we'll need a HTTP/1.1 stack for this.

Consider compatibility layer

It's been suggested that the compatibility layer might be a bad idea as a primary API, and instead worth moving to a separate location from which it can be imported (e.g. from hyper import httplib_compat as httplib. This would allow me to have a sensible API as the primary layer.

Handle the :status header field better.

Only increase window size when frames are consumed

Right now we increment the remote window size when frames are read off the wire, not when they're actually consumed. This can lead to hyper growing massive in memory when the user isn't actually reading those frames.

We should increment the window when the user is read()ing from the stream, not when hyper does.

Remove all references to HTTP/2.0

It's HTTP/2 now. This primarily affects the docs, we'll worry about interface changes later.

Support ALPN negotiation

I'd like to do ALPN to negotiate HTTP/2 like basically everyone else does, but I can't because the standard library doesn't support it.

PyOpenSSL might be my best bet here.

Implement smarter Window Manager

We added pluggable window managers in 0.0.4. With that done, we should add a new default window manager that is much smarter about dealing with the flow control window.

Handle errors better

Right now if hyper hits an error it just throws exceptions. This leaves the connection in an undefined state, and is fundamentally unusable from then on. We should start handling this better. 'Better' means two things:

Errors on a single stream should cause us to send RST_STREAM frames. Exceptions should still be thrown, but the connection should be OK.
Errors that affect the whole connection should cause us to send GOAWAY with an appropriate error code, and then tear the connection down.

Logging

We need to implement logging. HTTP/2.0 is complicated, and it would be very useful to be able to find out, for example, what frames we've sent in what order.

This involves the following stages:

Add logging to the HPACK encoder/decoder.
Add stream-level logging.
Add connection-level logging.

Streams have their windows backwards

Huffman Trie

The HPACK specification pre-defines a Huffman coding table for use with the HPACK layer. This includes Huffman codes for sending requests (which we can use to encode) and Huffman codes for sending responses (which we can use to decode).

Encoding data using the provided Huffman coding table isn't too challenging, we just use it as a dictionary. Decoding data is a substantially different problem. Specifically, we need a specialised pure-Python prefix trie.

Data will come off the wire in the form of a lengthy bytestring, like b'\x80@\x86\xf6h1\xcd\xb8\x7f\x84\xf7wx\xff@\x85\xf6A9\x8e\x83\x83\xce1w@\x88\xf6#X\xe6\xd7\xaa>\x9f\x88\xf4Fmi\x12\xd2qw@\x84\xf69\x11\xcf\x81\x0f'. Some parts of this bytestring will be consumed by the HPACK decoding process, and we'll end up with something like, but eventually we'll hit a Huffman coded string literal. What we'll have in hand is some fragment of that bytestring, suppose: b'\xdb\x6d\x88\x3e\x68\xd1\xcb\x12\x25\xba\x7f'.

What I need is a portion of a Huffman implementation that I can pass that entire bytestring to, and have it return the decoded string (in this case b'www.example.com'. This will involve a prefix trie that works bitwise on non-octet-aligned data (bleh). So far I've not found a suitable prefix trie, though this one by @ElricL could be used as a jumping-off point.