Comments (8)
And another question.
I saw hardcoded constant MaxBufferSize =1024.
I think we could implement the module without the constant at all.
I think the better idea is to use existing incoming chunk sizes as expected chunk sizes for output.
So if the caller feeds the stream with 1K chunks - it is good idea to feed target with ~1K chunks too.
But if the caller feeds the stream with 1M chunks - it is good idea to feed target with ~1M chunks too.
And in case of big uploaded files, most of the chunks could go to target even unchanged (no memory copy at all).
If you agree with me, I could also rewrite this logic too.
So it could be few separate commits:
- Rewrite to use dynamic target chunks
- Add benchmark script + rewrite in C-style (speed optimization)
from streaming-form-data.
First off, thanks for looking into this! Speed issues (particularly when processing larger files) is something I'm aware of but just haven't had the time recently to look into. 💯
- Absolutely, speed improvements are extremely important.
- Before doing a rewrite in C, I would propose reducing the number of Python-space operations that Cython is compiling. If you open
annotation.html
in a browser, the lines highlighted in yellow are the ones that Cython couldn't generate optimized code for. And as a result, the performance is not what it could be. I would love to see if there's a way to rewrite those highlighted lines using some other primitives that Cython provides, so we can get optimized code out.
In general I'm not at all opposed to C code if that's the only way out.
But I would first like to explore if there are incremental operations possible (like reducing the Python-space operations) that improve performance.
What do you think?
from streaming-form-data.
Good point regarding MaxBufferSize
. As a first thought, getting rid of this constant could reduce the number of calls to on_body
, which should improve things because that for
loop is what's killing performance at the moment.
This should actually be verifiable using the utils/benchmark.py
script (just give it a random file and content type and it should show you if things improved or not).
from streaming-form-data.
I confused you. I do not want to rewrite in C,
I want to use cython, but change the code this way:
code style before (it makes a call to Python object in chunk[idx])
bytes chunk
for idx in range(0, len(chunk))
element = chunk[idx]
code style after
char *first, *last, current
for current in range(first, last)
element = *current
from streaming-form-data.
Ah OK, understood. Yes, that sounds like a good plan. I'm super excited about the PR!
Follow-up question - do you think this would require additional testing beyond testing for correctness?
from streaming-form-data.
I missed you already have benchmark script. Cool.
Thanks for pointing to annotation.html. This is really cool report.
I'm new to cython (I first have heard about it yesterday), but it is a cool extension for python. Usually I write in c++.
do you think this would require additional testing beyond testing for correctness?
I would say yes. But it is hard to say how to test this well. The main idea is to try writing the code which can be verified by human as easy as possible (simple code).
Using raw pointers gives us a way to read unpredictable memory address. This can crash application in case of mistake. I'm not going to write memory using raw pointers in your algorithm.
Usually C programs can be analyzed using Valgrind-like tools which are checking all memory operations.
Static code analysis tools (like this) can also help sometimes.
And another option is to run program under some fuzzing tool.
from streaming-form-data.
The main idea is to try writing the code which can be verified by human as easy as possible (simple code).
Agreed. OK, then we can decide later what other kinds of testing needs to be introduced (if at all). At some point it might be a good idea to run it through Valgrind. But not right now.
from streaming-form-data.
I just pushed v0.6.0
to PyPI, which includes the performance fixes you contributed. Thanks a lot!
from streaming-form-data.
Related Issues (20)
- No wheels deployed HOT 3
- Build fails without Python2 headers HOT 5
- Ubuntu Server slow speed HOT 1
- How do you read file name? HOT 2
- Support for AsyncIO HOT 2
- Install failure with pip 20 and setuptools 46 HOT 8
- __pyx_check_sizeof_voidp = 1 / (int)(SIZEOF_VOID_P == sizeof(void*)) A wheel is created for you to put on pypi. HOT 6
- Any ideas how to deal with excel files in streaming ? HOT 1
- Get file name HOT 2
- Handling multi-valued fields? HOT 10
- Parser won't get registered and receive chunk data as a flask_appbuilder app in airflow webserver ui HOT 5
- How to use to read image file? HOT 2
- Move C declarations to a .pxd file to be available to other Cython modules. HOT 7
- Question: Does it support to parse raw multipart/form-data bodies ? HOT 5
- handler _parser.data_received failed with delimiting multipart stream into parts HOT 1
- `cgi` is being deprecated in 3.13 HOT 1
- smart-open as an optional dependency HOT 3
- How to validate content-type? HOT 7
- Question: Is it possible to raise an error when data for a non-registered target is parsed? HOT 7
- Fails to install with PyPy 3.10 on Windows HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streaming-form-data.