Coder Social home page Coder Social logo

hyper's People

Contributors

alekstorm avatar benlast avatar fredthomsen avatar hkwi avatar hyxbiao avatar irvind avatar jasongowthorpe avatar jdecuyper avatar johejo avatar kolanich avatar kostyaesmukov avatar kracekumar avatar kriechi avatar laike9m avatar lukasa avatar markjenkins avatar masaori335 avatar matangover avatar matjazp avatar mylh avatar nateprewitt avatar pkrolikowski avatar plucury avatar primozgodec avatar sigmavirus24 avatar sobolevn avatar sriram137 avatar t2y avatar vfaronov avatar viranch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hyper's Issues

Replace lists with deques.

There are a few places in the code where I'm using lists to store things (e.g. HPACK header tables) that I don't care about indexing in O(n), but I do care about prepending to in O(1). These should be replaced with deques.

Regularize docstrings

Docstrings should have the form: description, then parameters, then return value. Unfortunately, at least one has the return value too early. We need to fix that up.

Improve flow-control management.

In its current form, hyper is really stupid about flow control: whenever it receives a data packet it will issue a WINDOWUPDATE frame that extends the flow control window back to its original size. This leads to a huge amount of excess computation and network overhead.

Long-term, I want hyper to be able to make smarter choices about flow-control. In the short term, we can improve things by resizing the flow-control window in larger 'chunks' of about half the size of the window. In future it ought to be possible for hyper to examine the Content-Length header and determine whether to bother with stream-level flow control at all: i.e. if the Content-Length is less than the size of the stream window, don't bother sending any WINDOWUPDATE frames.

Reach 100% test coverage

We're doing pretty well on test coverage, but we can do better.

============================= test session starts ==============================
platform darwin -- Python 3.3.4 -- pytest-2.5.1
plugins: cov, xdist
gw0 [526] / gw1 [526] / gw2 [526] / gw3 [526]
scheduling tests via LoadScheduling
..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
--------------- coverage: platform darwin, python 3.3.4-final-0 ----------------
Name                             Stmts   Miss  Cover
----------------------------------------------------
hyper/__init__                      10      1    90%
hyper/contrib                       48      3    94%
hyper/http20/__init__                1      0   100%
hyper/http20/connection            142     10    93%
hyper/http20/frame                 177     10    94%
hyper/http20/hpack                 257     11    96%
hyper/http20/huffman                59      1    98%
hyper/http20/huffman_constants       5      0   100%
hyper/http20/response               63      5    92%
hyper/http20/stream                107      2    98%
hyper/http20/tls                    30      8    73%
hyper/httplib_compat                43     43     0%
----------------------------------------------------
TOTAL                              942     94    90%
========================= 526 passed in 29.15 seconds ==========================

Thread Safety

The intended use-case of hyper is the current use-case of Requests. This means we need to work in the standard single-threaded case (easy), but also in the multithreaded case. The multithreaded case is where HTTP/2.0 can provide some of the biggest benefits thanks to the use of a single connection instead of many.

Unfortunately, this means that there will be a single Connection object and multiple Stream objects. Each Stream should be local to a single thread (representing as they do a single request or a single response), but the Connection object (and the socket it uses) may be shared amongst threads. We need a solution for this workflow to not be really uncomfortable.

My suspicion right now is that we'll want to lock the socket object so that it can only be accessed from a single thread at a time. This needs to be done in a way that means that even if a massive response/request grabs the socket object, the others don't block longer than they need to on it. I'm not yet sure how this will work. Ideas are welcome.

Performance Tuning

hyper is not optimised for performance right now. While the HTTP/2 spec is changing I want to focus on correctness and the ability to easily change behaviour.

However, when it does get nailed down we'll want to make some steps to improve performance. Roberto Peon has provided an awesome list of things to work on, which I've reproduced here. People should take things off this list and break them out into new issues as they go.

  • Rather than two calls to socket.read() per frame, we should read into a buffer and then parse the buffer. Avoids repeated kernel-userspace transitions.
  • Avoid copying the data around when framing. This is somewhat challenging.
  • Add optional support for nghttp2's C HPACK implementation. (#60)
  • Use memoryview objects in hyper's pure-Python HPACK implementation. (#61)
  • We can definitely improve the Huffman implementation. Consider using this and this for ideas. Alternatively, consider arranging the Huffman table by length an then use one bit at a time, considering only the characters of the appropriate length.

Other intelligent performance optimisations should be added here.

Better TLS

We should make it easier to provide your own certificates for verification.

Get added to urllib3.

Let's start this discussion right now. I'd like to have hyper eventually be integrated into urllib3, giving all the happy HTTP/2.0 love to users of urllib3 (and indirectly, users of Requests). What do we think hyper needs to make this happen?

I don't expect this to happen quickly, and certainly not until hyper is more mature, so let's have a really honest discussion about what we need.

Potentially interested parties: @shazow, @sigmavirus24, @t-8ch, @kennethreitz.

Connection Transparency

Conceptually hyper can be split into two parts: a HTTP/2.0 client library and a 1.1-2.0 abstraction layer (in the future I may enforce that split if I extend hyper to provide server code too).

The abstraction layer is a point of interest, and I haven't figured out how the design should work. If people want to suggest implementation ideas, by all means go ahead.

Ship a h2-12 pre-release.

Python 2.7 and 3.3 support are still blocked behind PyOpenSSL, which is blocked behind cryptography. Until those projects can get moving I'd like to keep releasing hyper. I therefore propose one of two paths:

  • A pre-release version of 0.1.0 that supports Python 3.4 only.
  • A minor version, 0.0.5, that supports Python 3.4 only, adding Python 2.7 support in 0.1.0.

Include licenses for the code I've borrowed.

I've borrowed a pair of chunks of code (SocketServerThread in the tests and DeflateDecoder in the response) from @shazow's urllib3 project. Technically the MIT license requires that I distribute it along with that code, which I should really do. =)

Write a Requests adapter

As a shorter-term goal than getting into urllib3 (tracked by #18), let's try to get hyper some exposure by making it plug into Requests. This should lead to some really interesting experiments, e.g. running Twython over HTTP/2.0.

Iron out ssl_compat module.

There are some TODOs and best-guesses in the ssl_compat module that I'd like to iron out at some stage.

No specific actions here, just a general 'stay aware' note.

Bring hyper's HPACK implementation to draft-8.

Draft is here. Side-by-side diff is here.

Things to change:

  • After applying an updated value of the HTTP/2 setting SETTINGS_HEADER_TABLE_SIZE that changes the maximum size of the header table used by the encoder, the encoder MUST signal this change via an encoding context update. (Completed in e2b23b0.)
  • Update static table. (Completed in dc01314.)
  • Update Huffman code table. (Completed in 111051e.)
  • Update supported HPACK version to 8. (Completed in cececed.)

Automatically retry streams

GOAWAY frames carry a last_stream_id field that tells us which was the last stream that was definitely processed. I'd like to be able to automatically attempt to reconnect and resend streams that weren't processed.

Support plaintext HTTP/2 using HTTP Upgrade

It would be nice if we could initiate HTTP/1. and then upgrade to HTTP/2 if the response allows us to.

This basically requires writing a HTTP/1.1 stack: I doubt aiohttp gives me the needed control.

Bring hyper to draft-13 of HTTP/2.

Spec is here. Side-by-side diff is here.

Changes to make:

  • Make sure that HEADERS frames with END_STREAM set are allowed to be followed by CONTINUATION frames. (Completed in e59233f.)
  • Receiving frames of unknown type is no longer an error, simply discard them. Log that we did so. (Completed in f229d37.)
  • Padding has changed again. Update our mixin to cope. (section 6.1) (Completed in a7c42db.)
  • Data compression was removed. (section 6.1) (Completed in f4cb68e.)
  • Make sure that we include the padding metadata field in the flow control size. (Completed in 260b4f9.)
  • Setting IDs are now 16 bit, not 8. (section 6.5.1) (Completed in 34f09a8.)
  • SETTINGS_COMPRESS_DATA is gone. (section 6.5.1) (Completed in f4cb68e.)
  • CONTINUATION frames can no longer be padded. (Completed in a7c42db.)
  • The ALT_SVC frame is gone (though it will be in the extension, so maybe don't delete it just yet: instead, start implementing the extension). (I'm leaving the ALT_SVC frame in place, but continuing to track implementing the extension in #30.)
  • Allow trailing HEADERS frames (section 8.1). (Completed in 6620630.)
  • Discard any header field beginning with a colon other than the understood ones. (Completed in 7be2800.)
  • "Header fields containing multiple values MUST be concatenated unless the order is known to be insignificant". Work out how we do this. (section 8.1.2.3) (Deferred until a good API can be workshopped, tracked roughly by #36.)
  • MUST disable TLS renegotiation. (section 9.2.1) (Appears to be impossible in OpenSSL, so marking as complete.)
  • Restricted cipher suites (section 9.2.2) (Deferred, tracked in #64.)
  • Bring HPACK to draft 8, see #62. (Completed in cececed.)
  • Update hyper NPN/ALPN token to -13. (Completed in 8064e2c.)

HPACK encoder doesn't handle duplicate headers appropriately

As seen in http2jp/hpack-test-case#13, there's a very specific edge-case behaviour where the HPACK encoder incorrectly handles repeated headers. Specifically, if a header set contains two identical headers (both key and value) that is already in the reference set, we won't emit any headers, causing the output to contain only a single instance of that repeated header.

The nicest fix here is actually likely to be to fix up #36: repeated identical headers will therefore be concatenated together and will look different to the HPACK internals.

Need recv_into support in PyOpenSSL

Turns out PyOpenSSL doesn't have an implementation of recv_into. This somewhat defeats the point of the BufferedSocket, which attempts to minimise the number of copies of data! Right now, the h2-10 branch build is failing because of this absence.

asyncio backend

It would be awesome if we could have an asyncio-based backend that exposes an optional synchronous API. I have no idea if we can do this or not but I'd love it if we could, it would clean the code up tremendously.

Properly implement prioritisation.

Why the hell not?

Should be both inbound and outbound: that is, we understand server-signalled prioritisation and we can correctly signal and handle prioritisation chosen by our users.

Better integration test infrastructure.

The integration tests are great, but they get really long and rely on a ton of implementation details, which is a bit lame. We should be able to abstract this away into a set of requests and responses and assertions about what they'll contain.

Use memoryviews in HPACK

HPACK should use memoryview objects when decoding if at all possible to avoid copying memory all over the place.

Additional asyncio version

Rather than port this library to asyncio, why not have a new library that re-uses some common code from here but uses asyncio? That allows easy support of asyncio without limiting the access to HTTP/2.

MORE PROJECTS.

Related: #10.

Support Alt-Svc

Alt-Svc is another way of discovering HTTP/2 support. Again, we'll need a HTTP/1.1 stack for this.

Consider compatibility layer

It's been suggested that the compatibility layer might be a bad idea as a primary API, and instead worth moving to a separate location from which it can be imported (e.g. from hyper import httplib_compat as httplib. This would allow me to have a sensible API as the primary layer.

Only increase window size when frames are consumed

Right now we increment the remote window size when frames are read off the wire, not when they're actually consumed. This can lead to hyper growing massive in memory when the user isn't actually reading those frames.

We should increment the window when the user is read()ing from the stream, not when hyper does.

Support ALPN negotiation

I'd like to do ALPN to negotiate HTTP/2 like basically everyone else does, but I can't because the standard library doesn't support it.

PyOpenSSL might be my best bet here.

Implement smarter Window Manager

We added pluggable window managers in 0.0.4. With that done, we should add a new default window manager that is much smarter about dealing with the flow control window.

Handle errors better

Right now if hyper hits an error it just throws exceptions. This leaves the connection in an undefined state, and is fundamentally unusable from then on. We should start handling this better. 'Better' means two things:

  • Errors on a single stream should cause us to send RST_STREAM frames. Exceptions should still be thrown, but the connection should be OK.
  • Errors that affect the whole connection should cause us to send GOAWAY with an appropriate error code, and then tear the connection down.

Logging

We need to implement logging. HTTP/2.0 is complicated, and it would be very useful to be able to find out, for example, what frames we've sent in what order.

This involves the following stages:

  • Add logging to the HPACK encoder/decoder.
  • Add stream-level logging.
  • Add connection-level logging.

Huffman Trie

The HPACK specification pre-defines a Huffman coding table for use with the HPACK layer. This includes Huffman codes for sending requests (which we can use to encode) and Huffman codes for sending responses (which we can use to decode).

Encoding data using the provided Huffman coding table isn't too challenging, we just use it as a dictionary. Decoding data is a substantially different problem. Specifically, we need a specialised pure-Python prefix trie.

Data will come off the wire in the form of a lengthy bytestring, like b'\x80@\x86\xf6h1\xcd\xb8\x7f\x84\xf7wx\xff@\x85\xf6A9\x8e\x83\x83\xce1w@\x88\xf6#X\xe6\xd7\xaa>\x9f\x88\xf4Fmi\x12\xd2qw@\x84\xf69\x11\xcf\x81\x0f'. Some parts of this bytestring will be consumed by the HPACK decoding process, and we'll end up with something like, but eventually we'll hit a Huffman coded string literal. What we'll have in hand is some fragment of that bytestring, suppose: b'\xdb\x6d\x88\x3e\x68\xd1\xcb\x12\x25\xba\x7f'.

What I need is a portion of a Huffman implementation that I can pass that entire bytestring to, and have it return the decoded string (in this case b'www.example.com'. This will involve a prefix trie that works bitwise on non-octet-aligned data (bleh). So far I've not found a suitable prefix trie, though this one by @ElricL could be used as a jumping-off point.

Certificates for Windows

In tls.py we use SSLContext.set_default_verify_paths() to get the system certificate bundle. This works fine on my OS X development box, but leads to cert verification errors on Windows.

Options are either to use a Windows-specific version as well that we know will work, or to start bundling requests' CAcerts.

Integration Tests

There's a relative lack of testing on the Connection/Response layer as I sprinted on it over the weekend. Address this by stealing @shazow's socket-level testing idea.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.