Coder Social home page Coder Social logo

Comments (8)

kolomenkin avatar kolomenkin commented on June 12, 2024

And another question.
I saw hardcoded constant MaxBufferSize =1024.

I think we could implement the module without the constant at all.
I think the better idea is to use existing incoming chunk sizes as expected chunk sizes for output.

So if the caller feeds the stream with 1K chunks - it is good idea to feed target with ~1K chunks too.
But if the caller feeds the stream with 1M chunks - it is good idea to feed target with ~1M chunks too.

And in case of big uploaded files, most of the chunks could go to target even unchanged (no memory copy at all).

If you agree with me, I could also rewrite this logic too.

So it could be few separate commits:

  1. Rewrite to use dynamic target chunks
  2. Add benchmark script + rewrite in C-style (speed optimization)

from streaming-form-data.

siddhantgoel avatar siddhantgoel commented on June 12, 2024

First off, thanks for looking into this! Speed issues (particularly when processing larger files) is something I'm aware of but just haven't had the time recently to look into. 💯

  1. Absolutely, speed improvements are extremely important.
  2. Before doing a rewrite in C, I would propose reducing the number of Python-space operations that Cython is compiling. If you open annotation.html in a browser, the lines highlighted in yellow are the ones that Cython couldn't generate optimized code for. And as a result, the performance is not what it could be. I would love to see if there's a way to rewrite those highlighted lines using some other primitives that Cython provides, so we can get optimized code out.

In general I'm not at all opposed to C code if that's the only way out.

But I would first like to explore if there are incremental operations possible (like reducing the Python-space operations) that improve performance.

What do you think?

from streaming-form-data.

siddhantgoel avatar siddhantgoel commented on June 12, 2024

Good point regarding MaxBufferSize. As a first thought, getting rid of this constant could reduce the number of calls to on_body, which should improve things because that for loop is what's killing performance at the moment.

This should actually be verifiable using the utils/benchmark.py script (just give it a random file and content type and it should show you if things improved or not).

from streaming-form-data.

kolomenkin avatar kolomenkin commented on June 12, 2024

I confused you. I do not want to rewrite in C,
I want to use cython, but change the code this way:

code style before (it makes a call to Python object in chunk[idx])

bytes chunk
for idx in range(0, len(chunk))
    element = chunk[idx]

code style after

char *first, *last, current
for current in range(first, last)
    element = *current

from streaming-form-data.

siddhantgoel avatar siddhantgoel commented on June 12, 2024

Ah OK, understood. Yes, that sounds like a good plan. I'm super excited about the PR!

Follow-up question - do you think this would require additional testing beyond testing for correctness?

from streaming-form-data.

kolomenkin avatar kolomenkin commented on June 12, 2024

I missed you already have benchmark script. Cool.

Thanks for pointing to annotation.html. This is really cool report.
I'm new to cython (I first have heard about it yesterday), but it is a cool extension for python. Usually I write in c++.

do you think this would require additional testing beyond testing for correctness?

I would say yes. But it is hard to say how to test this well. The main idea is to try writing the code which can be verified by human as easy as possible (simple code).

Using raw pointers gives us a way to read unpredictable memory address. This can crash application in case of mistake. I'm not going to write memory using raw pointers in your algorithm.

Usually C programs can be analyzed using Valgrind-like tools which are checking all memory operations.
Static code analysis tools (like this) can also help sometimes.
And another option is to run program under some fuzzing tool.

from streaming-form-data.

siddhantgoel avatar siddhantgoel commented on June 12, 2024

The main idea is to try writing the code which can be verified by human as easy as possible (simple code).

Agreed. OK, then we can decide later what other kinds of testing needs to be introduced (if at all). At some point it might be a good idea to run it through Valgrind. But not right now.

from streaming-form-data.

siddhantgoel avatar siddhantgoel commented on June 12, 2024

I just pushed v0.6.0 to PyPI, which includes the performance fixes you contributed. Thanks a lot!

from streaming-form-data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.