Coder Social home page Coder Social logo

Comments (7)

bdarnell avatar bdarnell commented on July 1, 2024

Can you say more about how exactly this happens? It's true that we don't strip the value when parsing content-length, but it's supposed to already be stripped in the last line of HTTPHeaders.parse_line.

The \r\n is not supposed to make it to parse_line; those characters are handled in parse(). I don't see an issue when Content-Length is the last header: we have a test for this case at

>>> h = HTTPHeaders.parse("Content-Type: text/html\\r\\nContent-Length: 42\\r\\n")
.

I do see a couple of potential issues in edge cases, though.

  • Content-Length: 42\r\n \r\n (with a space between the CRLF pairs) will add a space to the value "42 "
  • Content-Length:\r\n 42\r\n (with the whole value in a continuation line) adds a leading space, " 42"

Both of these cases are errors now although they were accepted prior to bf90f3a. I think they're both technically legal although I'd have to go back to the RFCs to be sure.

from tornado.

chrisstaite-menlo avatar chrisstaite-menlo commented on July 1, 2024

We had some code that was manually proxying headers from an upstream request to a response that was pushing all of the lines passed to a AsyncHTTPClient.fetch header_callback to parse_line that triggered this.

from tornado.

kenballus avatar kenballus commented on July 1, 2024

I just tested sending a request with a Content-Length of 0 , and it worked totally fine. Can you enter an example of a request that causes the problem?

from tornado.

chrisstaite-menlo avatar chrisstaite-menlo commented on July 1, 2024

The Content-Length needs to be the last header which then gets interpreted as a multi-line continuation and then adds a space itself, as stated in the first message.

from tornado.

kenballus avatar kenballus commented on July 1, 2024

Got it; now I can reproduce the bug. Agreed that this is a problem.

Also, it turns out that gunicorn and fasthttp also have this exact same bug.

from tornado.

bdarnell avatar bdarnell commented on July 1, 2024

Got it; now I can reproduce the bug. Agreed that this is a problem.

I'm still not clear on what exactly the problem is. Is there an issue with HTTPHeaders.parse() or only with parse_line()? Internally, Tornado only uses parse_line() inside parse() and in curl_httpclient's header callback.

I see that there's a design mismatch in the interfaces of header_callback and parse_line: the former gives you the newlines, while parse_line expects them to be removed (this isn't formally specified but it's implied by the doctest). So you can't actually pass the values from header_callback directly to parse_line, even though this is superficially a reasonable thing to do.

There's also a couple of weird edge cases I noted at the bottom of #3321 (comment)

Does that cover everything or am I missing something?

Solutions to the design mismatch include:

  • Working as intended, just needs better docs
  • Deprecate header_callback in AsyncHTTPClient.fetch and replace it with a separate callback that gives you a pre-parsed HTTPHeaders object. We need a callback that gives you headers before the first streaming chunk, but doing it with raw header lines just pushes unnecessary work into the application.
  • Make parse_line able to handle newlines. This almost works (by accident) because simple headers get stripped, but continuation lines can cause extraneous whitespace.

from tornado.

bdarnell avatar bdarnell commented on July 1, 2024

Aha, now I see the problem. Single-line headers have leading and trailing whitespace stripped, while continuation lines make it possible to construct a header with trailing whitespace, potentially confusing users of that header. RFC 9110 is clear that trailing whitespace should be stripped from header values. I'm going to:

  1. Make continuation lines containing only whitespace an error. The parse_line interface doesn't let us handle this properly (we must preserve internal space but strip trailing space, and we can't tell in the line-by-line interface whether we're looking at a middle line or the last one of a header)
  2. Handle newlines in parse_line, specifically so that lines containing only newlines are no-ops. This fixes the way that the last header gets a trailing space if you use parse_line directly instead of parse
  3. Emit a deprecation warning on continuation lines. There should be no reason to support this feature any more and we should get rid of it in the future.

from tornado.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.