Comments (20)
On 2/01/2013, at 9:58 AM, jahrome [email protected] wrote:
Sounds not convenient to implement but it would be nice to forward server response to client on the fly to prevent streaming issues (ex: on youtube, have to download full video to start playing).
This is definitely on the cards. As you say, it's not easy to implement, and it also doesn't quite fit with the intercept-modify-continue paradigm that mitmproxy uses. So, you would have to be able to tag some requests for streaming, and some not. It's a feature for a few versions down the track. I'll leave this issue open till then.
Aldo
Aldo Cortesi
blog: http://corte.si
twitter: @cortesi
work: www.nullcube.com
+64 210 718 900
from mitmproxy.
Depending on complexity and time required, I could potentially contribute this feature. For my use case not having this is a deal breaker. Any suggestions on how to approach the problem? (I've done a fair amount of Python dev but only have a passing familiarity with how libmproxy is structured.)
The scenario I'm looking at is that I want to use mitmproxy to strip off the SSL before hitting an upstream forward proxy (which does content filtering). But for general use, having large files not stream is a bad experience for the user. My thought is that you'd want to be able to take certain requests (I would start by using Content-Type as the most obvious indication) and set those to stream. And if that means that there is some other functionality that has to be skipped in order to stream those, that would probably be acceptable (I'm not sure what the impact would be on the overall request/response flow if streaming, but I assume it might mess with a few things).
from mitmproxy.
Hi @bradleypeabody,
hard to say how difficult this actually is. For a first simple version, I'd start with the following:
- Implement
libmproxy.protocol.http.HTTPResponse.from_stream(include_content=False)
. This requires both changes in netlib (read_response shouldn't return the content all the time) and mitmproxy, but is mostly straightforward I think. I'd love to see a PR for this already at this stage. - Create a http_response_header hook. This is basically
.ask()
on the one side,.handle_http_...
on the other side (see https://github.com/mitmproxy/mitmproxy/pull/180/files) - Create a libmproxy inline script that tags requests. (
def http_response_header(ctx, flow): flow.stream=True
- whatever) - Then start at
mitmproxy/libmproxy/protocol/http.py
Line 919 in 84a016d
get_response_from_server
should not request the content initially, but call your hook instead. Depending on the existence of the stream attribute, download contents already at this point. Back in handle_flow, check if set to streaming and pipe everything in little chunks after sending the headers if so (self.c.client_conn.send(flow.response._assemble())
currently). You probably need to deal with chunked encoding as well, which makes that fairly compliated (please adapt netlib.http.read_chunked).
If you feel that this is too much effort, no worries. After all, I now know how to tackle this if I find the time to 😉. Usual tips apply (get a good debugger, e.g. PyCharm; please don't hesistate to ask questions; ...). If you're somewhat comfortable with European timezones, feel free to hit me up on #mitmproxy @ irc.oftc.net!
Cheers,
Max
from mitmproxy.
Hey Max, Thanks very much for that detailed response. Very helpful! Am going to dive in and will let you know how it goes. Best, Brad
from mitmproxy.
Hi @mhils,
Alright, did the first step:
mitmproxy/netlib#40
#306
Still deciphering the next steps but when you have a moment if you could confirm that the above is what you're looking for or otherwise provide feedback - that would be excellent!
I might have some questions on the next steps, we'll see.
Best, Brad
from mitmproxy.
Hi @bradleypeabody,
thanks for the PRs - both are exactly what I expected them to be! 😄 The next steps are fairly more complex. If I can help you with anything, please don't hesistate to contact me here, on IRC or via mail.
Thanks!
Max
from mitmproxy.
Hey @mhils,
Thanks. I have things more or less working now. I believe it follows the overall pattern you describe above: I added an inline script hook called "responseheaders", which gets called before "response" and allows you to set flow.stream to True based on whatever response header (or request) information you want.
The most significant changes were in HTTPHandler.handle_flow. If flow.stream==True was set to true in the responseheaders hook, then it either a) if chunked sends back each chunk to the client one at a time or b) if not chunked sends blocks of 4k at a time. The "response" inline script hook gets called even if flow.stream==True but in that case the content is empty when the hook gets it. So I'm pretty sure I left everything else working as expected, you just can't change the body if you stream it - which I believe is a totally acceptable trade-off. From doing simple browser client testing, it works in some cases but is buggy in others.
I'm trying to figure out how to write the tests for it now so I can stabilize it (I made a few more netlib changes around the http chunking logic and the testing on these is all good, but the mitmproxy testing setup seems more sophisticated). Any tips on how to do that? I see that handle_flow is called indirectly for many unit tests, but I don't see anything testing that method directly. I don't quite track with how pathod fits into the picture yet.
I haven't pushed any changes yet as I wanted to have the tests all good first, but I can push what I have to my fork if you want to take a look at it. Here are the changes, so you can get an idea of where it's at:
https://github.com/bradleypeabody/netlib/commit/280d9b862575d79b391e28c80156697d2d674c48
https://github.com/bradleypeabody/mitmproxy/commit/c47ddaa3a025597f8706c437f792c1ad12c388ab
And the corresponding inline script looks something like this:
def responseheaders(context, flow): ct = flow.response.headers["Content-Type"] if ct and len(ct) > 0 and ct[0].startswith("application/"): flow.stream = True
Best, Brad
from mitmproxy.
Hi Brad,
this actually sounds super exciting! Great work. 😄
mitmproxy tests: A lot of the boilerplate is hidden, yes :-) For you case, check out
Line 354 in 14db300
tservers.HTTPProxTest
and optionally provide a FlowMaster via masterclass = ...
(MasterRedirectRequest in the example), which constructs the responses. In your case, you would probably enable the streaming here (no need for a inline script). Similar to the example, you can use the pathod helper funtion to construct arbitrary responses (see pathod.net for documentation).
On another note, you should probably disable the transfer-encoding header stripping in
mitmproxy/libmproxy/protocol/http.py
Line 635 in 71ba708
Lastly, please don't hesistate to do an early Pull Request which is not completely final. It's a great way to discuss/review the code before it gets polished.
Thanks!
Max
from mitmproxy.
Hi Max,
Excellent and thanks!
I've got this in fairly good shape at this point. Here are the PRs:
netlib changes: mitmproxy/netlib#41
mitmproxy changes: #309
You're right about the Transfer-Encoding header - that aspect was totally broken; fixed now.
I was able to get the tests working, so the new code is covered and common cases are tested and pass. Doing some basic browser testing now works great for me as well - large video files which previously were hanging until the full download was done are able to be scrubbed through - just as if without a proxy.
Best, Brad
P.S. To be useful for other people, something will need to be added to the documentation for this feature as well.
from mitmproxy.
Nice one Brad. I've missed this functionality, and am looking forward to trying it out.
Cheers,
Jamie
from mitmproxy.
Hi Brad,
excellent work. I went over your code a bit, which also uncovered a few edge-cases in netlib's http handling 😄 Additionally, I added a --stream SIZE
switch to enable streaming without an inline script (stream branches at mitmproxy/mitmproxy mitmproxy/netlib). So far, all that looks really good. There are still a few things left:
- Expose functionality as API switch
- Add UI indication in the status bar if
--stream
is active (similar to anticache functionality) - Write docs
- Do more testing (I only tested mitmdump, not mitmproxy; Does it work well with
-w
?; How is the performance?)
One more thing: If you see any possibilites for improvement in the code, don't hesistate and let me know! :-)
Cheers,
Max
from mitmproxy.
Thanks Max,
I tried pulling down the latest on the 'stream' branch of mitmproxy, netlib and pathod but am getting some errors when running the tests. There are several, but this is an example of one:
Error in processing of request from 127.0.0.1:49313 Traceback (most recent call last): File "/Users/internetmktg/git/netlib/netlib/tcp.py", line 435, in connection_thread self.handle_client_connection(connection, client_address) File "/Users/internetmktg/git/mitmproxy/libmproxy/proxy/server.py", line 53, in handle_client_connection h.handle() File "/Users/internetmktg/git/mitmproxy/libmproxy/proxy/server.py", line 105, in handle raise e AttributeError: 'HTTPResponse' object has no attribute 'stream'
Perhaps we have some kind of incomplete merge?
I did some testing earlier with mitmproxy and it seemed to work just as well as mitmdump. I haven't tried using the -w which, I can check that - but I wanted to make sure that I was using the exact same code as you, thus the above.
On the docs, I can do something on this if you'd like, please do let me know where you think these changes should go and I can put something together and do a PR for that.
Best, Brad
P.S. What time of day is best to reach you on IRC?
from mitmproxy.
Hi Brad,
are you absolutely sure that you are on mitmproxy/stream and not bradleypeabody/stream (may differ for your git remotes)? These are my repo statuses (last commit):
netlib: 254a6862358af9d7d98bf443e310f659532d918d
mitmproxy: 4b4a18a2e4d7cf3e8862192b68f5a2295da9acbe
pathod: a0c8b20b7de32d333e72997f518a9159cbea84f6
git status
should obviously show no changes as well. :-)
docs: It would be fantastic if you would help here. I would probably add a section under the features category, but feel free to add it in the way that you think fits best.
IRC: Roughly 9:00 - 24:00 UTC. :-) Probably not until Friday though, I'm currently on holiday for a few days.
Cheers,
Max
from mitmproxy.
Hi Max,
Interesting - yes, I'm sure. I specifically re-cloned those repos from mitmproxy (not bradleypeabody) and I checked and I have those exact same commits as the topmost item on the current branch ('stream') for each. Definitely odd. And yes, no local changes.
Sounds good on the docs, will let you know on that.
And understood on the IRC times, will connect up if needed.
Best, Brad
P.S. Have fun on your holiday :D
from mitmproxy.
Sorry, I should have looked more closely - the relevant (failing) tests were skipped on my limited test configuration. I just pushed a fix on mitmproxy/stream.
Cheers,
Max
from mitmproxy.
Thanks Max - that fixed it! Will get back to you on the rest. Best, Brad
from mitmproxy.
Hi Max,
Please see latest PR: #312
That's some documentation and a fix for this error that was happening with mitmproxy (my solution was very simple but I'm not entirely sure of the root cause of the problem, so this might need more review).
Using -w does seem to work as expected (it creates the flow file but without the body for streamed responses). I didn't see any noticeable performance different either way, although I didn't look at this aspect very deeply.
I'll let you know if I find any other issues, but so far it seems good to me.
Not sure what to do with "Add UI indication in the status bar..." - is that something you'll do?
Best, Brad
from mitmproxy.
Hi Brad,
good work on the docs. :-)
The error you're seeing comes from the internal assumption that every HTTPResponse should have a .reply
attribute (which is added by calling .ask(flow.response)
). That should be changed in the future, but requires a few less-subtle changes. I added a workaround for now.
UI Indicator: done (5a808ca)
Looks pretty good to me so far - I'll do some extended testing the next few days.
Cheers,
Max
from mitmproxy.
Ok, great! I'll do some more testing as well and will let you know how that goes. Best, Brad
from mitmproxy.
Hi Brad,
I tested this for a week now and I must admit that I am seriously impressed - it works like a charm 😃 . Time to get this into master!
Again, thanks for the excellent work here. I hope you'll find other dealbreakers that you're willing to work on soon! 😉
Cheers,
Max
from mitmproxy.
Related Issues (20)
- Task failed: redirect daemon exited prematurely. HOT 3
- Show More Not working HOT 8
- Tracking Issue: MVP Capture Tab
- ProxyAuth doesn't support re-auth
- Android14 emulator, after successfully installing the certificate, accessing website with ssl alerts ssl/tls errors
- Crash when `--ssl-insecure` is enabled in reverse mode HOT 1
- Local mode: Capture redirector process output if it crashes
- CONTENT MISSING and PROTOCOL_ERROR HOT 7
- HTTP/2 protocol error: Header block missing mandatory :path header HOT 3
- Support gRPC Insecure (http2 over tcp)
- Render HTTP/2 pseudo headers
- Some APKs get stuck while using android-unpinner HOT 1
- Unable to verify response if event-stream HOT 8
- Unable to download mitmproxy CA certificate from http://mitm.it HOT 1
- Transparent proxy mode not working. HOT 1
- CONNECT request does not include Host header HOT 6
- The windows network share cannot be accessed in local mode HOT 3
- QUIC / HTTP/3 support status HOT 3
- AssertionError in _http1 after multiple calls HOT 14
- Cannot modify headers on requests with bodies (POST etc) when stream_large_bodies is enabled HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mitmproxy.