Coder Social home page Coder Social logo

Comments (20)

cortesi avatar cortesi commented on September 23, 2024

On 2/01/2013, at 9:58 AM, jahrome [email protected] wrote:

Sounds not convenient to implement but it would be nice to forward server response to client on the fly to prevent streaming issues (ex: on youtube, have to download full video to start playing).

This is definitely on the cards. As you say, it's not easy to implement, and it also doesn't quite fit with the intercept-modify-continue paradigm that mitmproxy uses. So, you would have to be able to tag some requests for streaming, and some not. It's a feature for a few versions down the track. I'll leave this issue open till then.

Aldo

Aldo Cortesi
blog: http://corte.si
twitter: @cortesi
work: www.nullcube.com
+64 210 718 900

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Depending on complexity and time required, I could potentially contribute this feature. For my use case not having this is a deal breaker. Any suggestions on how to approach the problem? (I've done a fair amount of Python dev but only have a passing familiarity with how libmproxy is structured.)

The scenario I'm looking at is that I want to use mitmproxy to strip off the SSL before hitting an upstream forward proxy (which does content filtering). But for general use, having large files not stream is a bad experience for the user. My thought is that you'd want to be able to take certain requests (I would start by using Content-Type as the most obvious indication) and set those to stream. And if that means that there is some other functionality that has to be skipped in order to stream those, that would probably be acceptable (I'm not sure what the impact would be on the overall request/response flow if streaming, but I assume it might mess with a few things).

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi @bradleypeabody,

hard to say how difficult this actually is. For a first simple version, I'd start with the following:

  • Implement libmproxy.protocol.http.HTTPResponse.from_stream(include_content=False). This requires both changes in netlib (read_response shouldn't return the content all the time) and mitmproxy, but is mostly straightforward I think. I'd love to see a PR for this already at this stage.
  • Create a http_response_header hook. This is basically .ask() on the one side, .handle_http_... on the other side (see https://github.com/mitmproxy/mitmproxy/pull/180/files)
  • Create a libmproxy inline script that tags requests. (def http_response_header(ctx, flow): flow.stream=True - whatever)
  • Then start at
    flow.response = self.get_response_from_server(flow.request)
    : get_response_from_server should not request the content initially, but call your hook instead. Depending on the existence of the stream attribute, download contents already at this point. Back in handle_flow, check if set to streaming and pipe everything in little chunks after sending the headers if so (self.c.client_conn.send(flow.response._assemble()) currently). You probably need to deal with chunked encoding as well, which makes that fairly compliated (please adapt netlib.http.read_chunked).

If you feel that this is too much effort, no worries. After all, I now know how to tackle this if I find the time to 😉. Usual tips apply (get a good debugger, e.g. PyCharm; please don't hesistate to ask questions; ...). If you're somewhat comfortable with European timezones, feel free to hit me up on #mitmproxy @ irc.oftc.net!

Cheers,
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Hey Max, Thanks very much for that detailed response. Very helpful! Am going to dive in and will let you know how it goes. Best, Brad

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Hi @mhils,

Alright, did the first step:
mitmproxy/netlib#40
#306

Still deciphering the next steps but when you have a moment if you could confirm that the above is what you're looking for or otherwise provide feedback - that would be excellent!

I might have some questions on the next steps, we'll see.

Best, Brad

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi @bradleypeabody,

thanks for the PRs - both are exactly what I expected them to be! 😄 The next steps are fairly more complex. If I can help you with anything, please don't hesistate to contact me here, on IRC or via mail.

Thanks!
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Hey @mhils,

Thanks. I have things more or less working now. I believe it follows the overall pattern you describe above: I added an inline script hook called "responseheaders", which gets called before "response" and allows you to set flow.stream to True based on whatever response header (or request) information you want.

The most significant changes were in HTTPHandler.handle_flow. If flow.stream==True was set to true in the responseheaders hook, then it either a) if chunked sends back each chunk to the client one at a time or b) if not chunked sends blocks of 4k at a time. The "response" inline script hook gets called even if flow.stream==True but in that case the content is empty when the hook gets it. So I'm pretty sure I left everything else working as expected, you just can't change the body if you stream it - which I believe is a totally acceptable trade-off. From doing simple browser client testing, it works in some cases but is buggy in others.

I'm trying to figure out how to write the tests for it now so I can stabilize it (I made a few more netlib changes around the http chunking logic and the testing on these is all good, but the mitmproxy testing setup seems more sophisticated). Any tips on how to do that? I see that handle_flow is called indirectly for many unit tests, but I don't see anything testing that method directly. I don't quite track with how pathod fits into the picture yet.

I haven't pushed any changes yet as I wanted to have the tests all good first, but I can push what I have to my fork if you want to take a look at it. Here are the changes, so you can get an idea of where it's at:
https://github.com/bradleypeabody/netlib/commit/280d9b862575d79b391e28c80156697d2d674c48
https://github.com/bradleypeabody/mitmproxy/commit/c47ddaa3a025597f8706c437f792c1ad12c388ab
And the corresponding inline script looks something like this:

def responseheaders(context, flow):
    ct = flow.response.headers["Content-Type"]
    if ct and len(ct) > 0 and ct[0].startswith("application/"):
        flow.stream = True

Best, Brad

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi Brad,

this actually sounds super exciting! Great work. 😄

mitmproxy tests: A lot of the boilerplate is hidden, yes :-) For you case, check out

class TestRedirectRequest(tservers.HTTPProxTest):
You basically subclass tservers.HTTPProxTest and optionally provide a FlowMaster via masterclass = ... (MasterRedirectRequest in the example), which constructs the responses. In your case, you would probably enable the streaming here (no need for a inline script). Similar to the example, you can use the pathod helper funtion to construct arbitrary responses (see pathod.net for documentation).

On another note, you should probably disable the transfer-encoding header stripping in

def _assemble_headers(self):

Lastly, please don't hesistate to do an early Pull Request which is not completely final. It's a great way to discuss/review the code before it gets polished.

Thanks!
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Hi Max,

Excellent and thanks!

I've got this in fairly good shape at this point. Here are the PRs:
netlib changes: mitmproxy/netlib#41
mitmproxy changes: #309

You're right about the Transfer-Encoding header - that aspect was totally broken; fixed now.

I was able to get the tests working, so the new code is covered and common cases are tested and pass. Doing some basic browser testing now works great for me as well - large video files which previously were hanging until the full download was done are able to be scrubbed through - just as if without a proxy.

Best, Brad

P.S. To be useful for other people, something will need to be added to the documentation for this feature as well.

from mitmproxy.

Jamie-Landeg-Jones avatar Jamie-Landeg-Jones commented on September 23, 2024

Nice one Brad. I've missed this functionality, and am looking forward to trying it out.

Cheers,
Jamie

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi Brad,

excellent work. I went over your code a bit, which also uncovered a few edge-cases in netlib's http handling 😄 Additionally, I added a --stream SIZE switch to enable streaming without an inline script (stream branches at mitmproxy/mitmproxy mitmproxy/netlib). So far, all that looks really good. There are still a few things left:

  • Expose functionality as API switch
  • Add UI indication in the status bar if --stream is active (similar to anticache functionality)
  • Write docs
  • Do more testing (I only tested mitmdump, not mitmproxy; Does it work well with -w?; How is the performance?)

One more thing: If you see any possibilites for improvement in the code, don't hesistate and let me know! :-)

Cheers,
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Thanks Max,

I tried pulling down the latest on the 'stream' branch of mitmproxy, netlib and pathod but am getting some errors when running the tests. There are several, but this is an example of one:

Error in processing of request from 127.0.0.1:49313
Traceback (most recent call last):
  File "/Users/internetmktg/git/netlib/netlib/tcp.py", line 435, in connection_thread
    self.handle_client_connection(connection, client_address)
  File "/Users/internetmktg/git/mitmproxy/libmproxy/proxy/server.py", line 53, in handle_client_connection
    h.handle()
  File "/Users/internetmktg/git/mitmproxy/libmproxy/proxy/server.py", line 105, in handle
    raise e
AttributeError: 'HTTPResponse' object has no attribute 'stream'

Perhaps we have some kind of incomplete merge?

I did some testing earlier with mitmproxy and it seemed to work just as well as mitmdump. I haven't tried using the -w which, I can check that - but I wanted to make sure that I was using the exact same code as you, thus the above.

On the docs, I can do something on this if you'd like, please do let me know where you think these changes should go and I can put something together and do a PR for that.

Best, Brad

P.S. What time of day is best to reach you on IRC?

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi Brad,

are you absolutely sure that you are on mitmproxy/stream and not bradleypeabody/stream (may differ for your git remotes)? These are my repo statuses (last commit):

netlib: 254a6862358af9d7d98bf443e310f659532d918d
mitmproxy: 4b4a18a2e4d7cf3e8862192b68f5a2295da9acbe
pathod: a0c8b20b7de32d333e72997f518a9159cbea84f6

git status should obviously show no changes as well. :-)

docs: It would be fantastic if you would help here. I would probably add a section under the features category, but feel free to add it in the way that you think fits best.

IRC: Roughly 9:00 - 24:00 UTC. :-) Probably not until Friday though, I'm currently on holiday for a few days.

Cheers,
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Hi Max,

Interesting - yes, I'm sure. I specifically re-cloned those repos from mitmproxy (not bradleypeabody) and I checked and I have those exact same commits as the topmost item on the current branch ('stream') for each. Definitely odd. And yes, no local changes.

Sounds good on the docs, will let you know on that.

And understood on the IRC times, will connect up if needed.

Best, Brad

P.S. Have fun on your holiday :D

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Sorry, I should have looked more closely - the relevant (failing) tests were skipped on my limited test configuration. I just pushed a fix on mitmproxy/stream.

Cheers,
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Thanks Max - that fixed it! Will get back to you on the rest. Best, Brad

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Hi Max,

Please see latest PR: #312

That's some documentation and a fix for this error that was happening with mitmproxy (my solution was very simple but I'm not entirely sure of the root cause of the problem, so this might need more review).

Using -w does seem to work as expected (it creates the flow file but without the body for streamed responses). I didn't see any noticeable performance different either way, although I didn't look at this aspect very deeply.

I'll let you know if I find any other issues, but so far it seems good to me.

Not sure what to do with "Add UI indication in the status bar..." - is that something you'll do?

Best, Brad

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi Brad,

good work on the docs. :-)

The error you're seeing comes from the internal assumption that every HTTPResponse should have a .reply attribute (which is added by calling .ask(flow.response)). That should be changed in the future, but requires a few less-subtle changes. I added a workaround for now.

UI Indicator: done (5a808ca)

Looks pretty good to me so far - I'll do some extended testing the next few days.

Cheers,
Max

from mitmproxy.

bradleypeabody avatar bradleypeabody commented on September 23, 2024

Ok, great! I'll do some more testing as well and will let you know how that goes. Best, Brad

from mitmproxy.

mhils avatar mhils commented on September 23, 2024

Hi Brad,

I tested this for a week now and I must admit that I am seriously impressed - it works like a charm 😃 . Time to get this into master!
Again, thanks for the excellent work here. I hope you'll find other dealbreakers that you're willing to work on soon! 😉

Cheers,
Max

from mitmproxy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.