Coder Social home page Coder Social logo

siddhantgoel / streaming-form-data Goto Github PK

View Code? Open in Web Editor NEW
151.0 151.0 32.0 12.73 MB

Streaming (and fast!) parser for multipart/form-data written in Cython

Home Page: https://streaming-form-data.readthedocs.io/en/latest/

License: MIT License

Python 78.19% Makefile 1.21% Cython 20.60%
cython form-data forms http multipart-formdata python python3 web

streaming-form-data's Introduction

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

streaming-form-data's People

Contributors

bbeattie-phxlabs avatar cclauss avatar dependabot[bot] avatar florianvazelle avatar gabrielcedran avatar hbusul avatar jasopolis avatar kolomenkin avatar mephi42 avatar nterysin avatar pyup-bot avatar raethlein avatar remram44 avatar siddhantgoel avatar tokicnikolaus avatar wouterkoorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

streaming-form-data's Issues

Install failure with pip 20 and setuptools 46

Hi, sorry to intrude, but there may be some issues regarding recent evolution made by the pypa and the choice of poetry as the "packaging backend" (no offense). Information may be worth something on your side of other users of this package, don't know.

In an environment with pip==18.1, setuptools==40.8.0, gcc and some other libs, pip install streaming-form-data works without any issue, everything's fine. Well, you gotta have more than just pip and setuptools, but that happens.

In an environment with pip==20.1.1 (and higher) and setuptools==46.4.0, even with the gcc and libs, I got the following

root@cdb4ae3c174c:/app# pip install streaming-form-data
Collecting streaming-form-data
  Downloading streaming-form-data-1.7.0.tar.gz (92 kB)
     |████████████████████████████████| 92 kB 1.1 MB/s 
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: streaming-form-data
  Building wheel for streaming-form-data (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python /usr/local/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmp5lcaherm
       cwd: /tmp/pip-install-zbvymr76/streaming-form-data
  Complete output (25 lines):
  A setup.py file already exists. Using it.
  Traceback (most recent call last):
    File "/tmp/pip-install-zbvymr76/streaming-form-data/setup.py", line 2, in <module>
      from setuptools import setup
  ModuleNotFoundError: No module named 'setuptools'
[...]
 ERROR: Failed building wheel for streaming-form-data
Failed to build streaming-form-data
ERROR: Could not build wheels for streaming-form-data which use PEP 517 and cannot be installed directly

At first I didn't paid too much attention, thought it was about the wheel and the pep 517, and tried things like --no-binary :all: and --no-binary streaming-form-data, also switching the use of PEP 517, to no avail alas. Before realizing it was not the wheel itself, but the direct install too.

Long story short, if you dig around the pypa discussions on github and so forth, it seems there is some issue with some packaging (especially PEP 517/poetry linked) and finding packages in build environments. Hence the "no setuptools found" issue at the beginning of the stacktrace. Plus there is quite some instability/uncertainty in the current state of Python packaging (once again, cf. pypa and recent evolution).

So for now my solution is

pip install poetry
pip --no-build-isolation streaming-form-data

But that's not so satisfying. Is there something that could be done streaming-form-data side ? I don't think so but it's worth a shot. Maybe there other way to configure pip so as to not have this issue (maybe something else than no binary or pep 517 related elements).

Content-Type from FileTarget

How to get multipart Content-Type from FileTarget?
Content-Disposition: form-data; name="file"; filename="Webp.net-resizeimage.png"\r\n
Content-Type: image/png\r\n\r\n
image/png in this case

Fails to install with PyPy 3.10 on Windows

I'm seeing the following error in CI when trying to install on Windows with PyPy 3.10:

  DEPRECATION: streaming-form-data is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for streaming-form-data: started
  Running setup.py install for streaming-form-data: finished with status 'error'
  error: subprocess-exited-with-error
  
  × Running setup.py install for streaming-form-data did not run successfully.
  │ exit code: 1
  ╰─> [30 lines of output]
      running install
      D:\a\kolo\kolo\python\.venv\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-pypy310
      creating build\lib.win-amd64-pypy310\streaming_form_data
      copying streaming_form_data\parser.py -> build\lib.win-amd64-pypy310\streaming_form_data
      copying streaming_form_data\targets.py -> build\lib.win-amd64-pypy310\streaming_form_data
      copying streaming_form_data\validators.py -> build\lib.win-amd64-pypy310\streaming_form_data
      copying streaming_form_data\__init__.py -> build\lib.win-amd64-pypy310\streaming_form_data
      running build_ext
      building 'streaming_form_data._parser' extension
      creating build\temp.win-amd64-pypy310
      creating build\temp.win-amd64-pypy310\Release
      creating build\temp.win-amd64-pypy310\Release\streaming_form_data
      "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.35.32215\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -ID:\a\kolo\kolo\python\.venv\include -IC:\hostedtoolcache\windows\PyPy\3.10.12\x86\include -IC:\hostedtoolcache\windows\PyPy\3.10.12\x86\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.35.32215\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.35.32215\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /Tcstreaming_form_data/_parser.c /Fobuild\temp.win-amd64-pypy310\Release\streaming_form_data/_parser.obj
      _parser.c
      streaming_form_data/_parser.c(12214): error C2078: too many initializers
      streaming_form_data/_parser.c(12212): error C2078: too many initializers
      streaming_form_data/_parser.c(12362): error C2078: too many initializers
      streaming_form_data/_parser.c(12360): error C2078: too many initializers
      streaming_form_data/_parser.c(12526): error C2078: too many initializers
      streaming_form_data/_parser.c(12524): error C2078: too many initializers
      streaming_form_data/_parser.c(12658): error C2078: too many initializers
      streaming_form_data/_parser.c(12656): error C2078: too many initializers
      streaming_form_data/_parser.c(16315): error C2078: too many initializers
      streaming_form_data/_parser.c(16313): error C2078: too many initializers
      error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Enterprise\\VC\\Tools\\MSVC\\14.35.32215\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

Parser won't get registered and receive chunk data as a flask_appbuilder app in airflow webserver ui

Hi,

I integrated the flask example of upload-test.py to airflow webserver UI as a uploading plugin, it allows user to upload a csv file within airflow webserver UI and save the file to server directory(''/usr/local/airflow/uploads/'), however the parser fails to get registered with any header information and the chunked data won't be written to file through the parser.

  1. I have verified that the upload-test.py worked well on local flask host with @app.route.
  2. I have verified the airflow plugin interface below functioned well when I using request.files and .save(path_to_save).

Here is the flask app(@expose) under airflow plugin:

class PipelineLauncher(AppBuilderBaseView): # from flask_appbuilder import BaseView as AppBuilderBaseView
    @expose('/', methods=('GET', 'POST'))
    def list(self):
        if request.method == 'POST':
            path_to_save = '/usr/local/airflow/uploads/temp.csv'   #path mounted with airflow
            
            file_ = FileTarget(path_to_save)
            parser = StreamingFormDataParser(headers=request.headers)
            parser.register('file', file_)

            while True:
                chunk = request.stream.read(8192)
                if not chunk:
                    break
                parser.data_received(chunk)
            
            #df = pd.read_csv(path_to_save) this will throw error 'pandas.errors.EmptyDataError: No columns to parse from file'
            #rows = df.shape[0]

            return self.render_template("debug.html", 
                                     path_to_save=path_to_save,
                                     file_object=file_,
                                     header=request.headers,
                                     filename=file_.multipart_filename,
                                     content_type=file_.multipart_content_type)
        return self.render_template("index.html")

# debug.html
# path_to_save: {{ path_to_save }}
# file_object:  {{ file_object }}
# header: {{ header }}
# filename: {{ filename }}
# content_type: {{ content_type }}
          
bp = Blueprint(
    "pipeline", __name__,
    template_folder='templates',
    static_folder='static',
    static_url_path='/static/pipeline_launcher')

class AirflowCustomLauncher(AirflowPlugin):
    name = "pipeline"
    pipeline_launcher = PipelineLauncher()
    pipeline_launcher_package = {
        "name": "Manual Upload Plugin",
        "category": "Launch Pipeline",
        "view": pipeline_launcher
    }
    appbuilder_views = [pipeline_launcher_package]
    admin_views = [pipeline_launcher_package]
    flask_blueprints = [bp]

index.html

{% include "airflow/master.html" %}
{% block body %} 
<title>Upload XLS/XLSX/CSV files to InfluxDB.</title>
<form method="post" class="admin-form form-horizontal" enctype="multipart/form-data" role='form'>
  <div class="col-md-12 text-center">
    <h3>Manual Upload for InfluxDB</h3>
    <br/>
    <p> This plugin currently only supports .csv, .xls, and .xlsx files. Larger files and .xlsx files will take longer than usual to process. Upon submitting a file, you will be taken to a page to preview your file as well as configure upload parameters. </p>
  </div>
  {% if csrf_token %}
  <input type="hidden" name="csrf_token" value="{{ csrf_token() }}" />
  {% endif %}
  <div class="form-group"> --> 
      <!-- You can take parameters from the user using the form elements and pass them to backend -->
    <label class="col-md-4 control-label">File: </label>
    <div class="col-md-6">
      <input class="form-control" type="file" name="file" />
    </div>
  </div>
  <div class="col-md-offset-4 col-md-10 submit-row">
    <button type="submit" class="btn btn-primary">Process File</button>
  </div>
  <div class="container">
    {% for message in get_flashed_messages() %}
      <div class="alert alert-warning">
        {{ message }}
      </div>
    {% endfor %}
  </div>
</form>
{% endblock %} 

The plugin allows me to choose a file to upload, and after I selected a csv file, here is the output from debug.html page:

path_to_save: /usr/local/airflow/uploads/temp.csv

file_object: <streaming_form_data.targets.FileTarget object at 0x7f7be38fb550>

header: Host: localhost:8080 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Accept-Encoding: gzip, deflate Accept-Language: en-us Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryDSj0i1GXH4P0ITsx Origin: http://localhost:8080 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15 Connection: keep-alive Upgrade-Insecure-Requests: 1 Referer: http://localhost:8080/pipelinelauncher/ Content-Length: 328842 Cookie: session=.eJwlj0tuQyEMRffCOAP-tt9mngzYbVSaVMAbRd17iSqPru5Hxy9z6pD5aQ7lPuVmznszhwnQFAGJrGvFh4aOxbP1PiOpT14qVScgBUKELATYgFJRpSSQhKu3pFmEi5Kr6kVjzM0iWqLkss9SiEKstlllZIGguGvIO6wCxWyQHxnf_JDHMsca10arc-i5nl_y2ISsEZLDrLW6nGy2GPYxSAQBTrYgqrftvdT445yL1zVPvfcl413vfTv9WbnLlnvyZq4p4_99Z37_AH8MU-Q.YTuY0A.t-_l07dcNPe_RN6CWI_Pg5cZ3vo

filename: None

content_type: None

Any help would be appreciated. Thank you.

S3 target

An S3Target class would be useful to stream file contents directly from the request to an S3 bucket. This actually shouldn't be too hard to write using boto3.

Build fails without Python2 headers

When trying to install via pip3 on Ubuntu 20.04, which doesn't bundle Python2 headers by default, the compilation of _parser.c fails, as it's looking for Python.h in a Python2.7 directory.
It can be worked around with apt install python2.7-dev. Not sure where the explicit Python2 reference is coming from - I couldn't spot it at first sight in this repo.

$ pip3 install --user streaming-form-data
Collecting streaming-form-data
  Using cached streaming-form-data-1.6.0.tar.gz (91 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Building wheels for collected packages: streaming-form-data
  Building wheel for streaming-form-data (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmp0ldlmrgn build_wheel /tmp/tmpwhxvefb1                                                                                                                  
       cwd: /tmp/pip-install-nf4hxqoq/streaming-form-data                                                                                                                                   
  Complete output (53 lines):                                                                                                                                                               
  Traceback (most recent call last):                                                                                                                                                        
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/utils/env.py", line 889, in _run                                                                           
      output = subprocess.check_output(                                                                                                                                                     
    File "/usr/lib/python3.8/subprocess.py", line 411, in check_output                                                                                                                      
      return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,                                                                                                                      
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/utils/_compat.py", line 205, in run                                                                        
      raise CalledProcessError(                                                                                                                                                             
  poetry.utils._compat.CalledProcessError: Command '['/usr/bin/python', 'setup.py', 'build', '-b', 'build']' returned non-zero exit status 1.                                               
                                                                                                                                                                                            
  During handling of the above exception, another exception occurred:                                                                                                                       
                                                                                                                                                                                            
  Traceback (most recent call last):                                                                                                                                                        
    File "/tmp/tmp0ldlmrgn", line 280, in <module>                                                                                                                                          
      main()                                                                                                                                                                                
    File "/tmp/tmp0ldlmrgn", line 263, in main                                                                                                                                              
      json_out['return_val'] = hook(**hook_input['kwargs'])                                                                                                                                 
    File "/tmp/tmp0ldlmrgn", line 204, in build_wheel                                                                                                                                       
      return _build_backend().build_wheel(wheel_directory, config_settings,                                                                                                                 
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/masonry/api.py", line 62, in build_wheel                                                                   
      WheelBuilder.make_in(                                                                                                                                                                 
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/masonry/builders/wheel.py", line 55, in make_in                                                            
      wb.build()                                                                                                                                                                            
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/masonry/builders/wheel.py", line 81, in build                                                              
      self._build(zip_file)                                                                                                                                                                 
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/masonry/builders/wheel.py", line 103, in _build                                                            
      self._env.run(                                                                                                                                                                        
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/utils/env.py", line 856, in run                                                                            
      return self._run(cmd, **kwargs)                                                                                                                                                       
    File "/tmp/pip-build-env-uyrjh8h7/overlay/lib/python3.8/site-packages/poetry/utils/env.py", line 893, in _run                                                                           
      raise EnvCommandError(e, input=input_)                                                                                                                                                
  poetry.utils.env.EnvCommandError: Command ['/usr/bin/python', 'setup.py', 'build', '-b', 'build'] errored with the following return code 1, and output:                                   
  running build                                                                                                                                                                             
  running build_py                                                                                                                                                                          
  creating build                                                                                                                                                                            
  creating build/lib.linux-x86_64-2.7                                                                                                                                                       
  creating build/lib.linux-x86_64-2.7/streaming_form_data                                                                                                                                   
  copying streaming_form_data/validators.py -> build/lib.linux-x86_64-2.7/streaming_form_data                                                                                               
  copying streaming_form_data/targets.py -> build/lib.linux-x86_64-2.7/streaming_form_data                                                                                                  
  copying streaming_form_data/parser.py -> build/lib.linux-x86_64-2.7/streaming_form_data                                                                                                   
  copying streaming_form_data/__init__.py -> build/lib.linux-x86_64-2.7/streaming_form_data                                                                                                 
  copying streaming_form_data/_parser.pyx -> build/lib.linux-x86_64-2.7/streaming_form_data                                                                                                 
  copying streaming_form_data/_parser.c -> build/lib.linux-x86_64-2.7/streaming_form_data                                                                                                   
  running build_ext                                                                                                                                                                         
  building 'streaming_form_data._parser' extension                                                                                                                                          
  creating build/temp.linux-x86_64-2.7                                                                                                                                                      
  creating build/temp.linux-x86_64-2.7/streaming_form_data                                                                                                                                  
  x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-1x6jhf/python2.7-2.7.18~rc1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c streaming_form_data/_parser.c -o build/temp.linux-x86_64-2.7/streaming_form_data/_parser.o                                         
  streaming_form_data/_parser.c:4:10: fatal error: Python.h: Datei oder Verzeichnis nicht gefunden                                                                                          
      4 | #include "Python.h"                                                                                                                                                               
        |          ^~~~~~~~~~                                                                                                                                                               
  compilation terminated.                                                                                                                                                                   
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1                                                                                                                           
                                                                                                                                                                                            
  ----------------------------------------                                                                                                                                                  
  ERROR: Failed building wheel for streaming-form-data
Failed to build streaming-form-data
ERROR: Could not build wheels for streaming-form-data which use PEP 517 and cannot be installed directly

Any ideas how to deal with excel files in streaming ?

Hello,
Thanks for your work.
I can read and parse csv file with your library in streaming.
I want to do the same with excel file but I didn't find a way to do it.
I know it's not directly related to your library but maybe you have a clue.
Is it possible to convert a excel file to a csv file reading a http stream ? or is it possible to read a excel file directly from the stream ?

Checking for case insensitive Content-Type

Hi, great library!

This line here checks for a case sensitive 'Content-Type':

def parse_content_boundary(headers):
content_type = headers.get('Content-Type')
if not content_type:
raise ParseFailedException()

I was testing with curl and the browser, they both send the content type header as uppercase: 'CONTENT-TYPE', which causes this library to throw ParseFailedException.

According to the http spec, headers are case-insensitive: https://stackoverflow.com/questions/7718476/are-http-headers-content-type-c-case-sensitive/7718542

It'd be great if you could make the dictionary get case insensitive.

Support for AsyncIO

Building async support into this package would make a lot of sense. One use case, for example, would be to do networked IO in parallel while file chunks are being uploaded. This is required for S3 uploads (or basically uploads to any other file hosting service).

The API for how this should look like is not yet clear. Async code (at least how Python implements it) tends to "split" the codebase into sync and async parts, so we would need to figure that one out first.

There's a discussion and some ideas in #29 .

Handling multi-valued fields?

How does this library handle multi-valued fields? For example, if the input form allows for the entry of an arbitrary number of "people", each of which has a field height, weight, etc?

If I understand the source code correctly, as data comes in it is simply appended to an array (for ValueTarget()), which is joined to a single bytes string when value is called, which would leave me with something like 14022036 if there were three weight values submitted (140,220,36), with no way to know where to split the values.

I could make a custom subclass that doesn't join the array, but that would only work if I could be sure each "chunk" was a full, discrete, value, which I do not believe is the case...

__pyx_check_sizeof_voidp = 1 / (int)(SIZEOF_VOID_P == sizeof(void*)) A wheel is created for you to put on pypi.

From line 200 of _parser.c there is a enum definition:

#ifdef SIZEOF_VOID_P
    enum { __pyx_check_sizeof_voidp = 1 / (int)(SIZEOF_VOID_P == sizeof(void*)) };
  #endif

which, when SIZEOF_VOID_P != sizeof(void*)), will raise an error. I compile your code on windows mingw-w64 using python setup.py build --compiler=mingw32 and get this error, which means SIZEOF_VOID_P is defined and SIZEOF_VOID_P != sizeof(void*). Is that a deliberate setting?

poetry issue: can't install or build wheel on linux

Command to reproduce:

docker run --entrypoint=pip3 python:3.6 install "streaming-form-data==1.5.1"

Error message / stack trace:

  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python /usr/local/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpv61zx81r
       cwd: /tmp/pip-wheel-j1vdg9g0/streaming-form-data
  Complete output (14 lines):
  Traceback (most recent call last):
    File "/usr/local/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 257, in <module>
      main()
    File "/usr/local/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 91, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-ztxe98dq/overlay/lib/python3.6/site-packages/poetry/masonry/api.py", line 25, in get_requires_for_build_wheel
      poetry = Factory().create_poetry(Path("."))
    File "/tmp/pip-build-env-ztxe98dq/overlay/lib/python3.6/site-packages/poetry/factory.py", line 54, in create_poetry
      raise RuntimeError("The Poetry configuration is invalid:\n" + message)
  RuntimeError: The Poetry configuration is invalid:
    - Additional properties are not allowed ('url' was unexpected)

How to validate content-type?

I'm using S3Target to upload a file directly to S3. But I want to validate the content-type to allow only certain extensions (pdf, jpeg, png, docx, xlxs and pptx)

I'm not sure how should I use multipart_content_type to validate this. Can you guys help me?

I'm using FastAPI as web framework.

parser = StreamingFormDataParser(headers=request.headers)
target = S3Target(
    file_path=file_path,
    mode='wb',
    transport_params={'client': self._client},
    validator=MaxSizeValidator(self.max_file_size),
)
parser.register('file', target)
async for chunk in request.stream():
    body_validator(chunk)
    parser.data_received(chunk)

failed to install ver 1.5.0 or above via pip (install ver 1.4.0 is fine)

Hi

I have tried to install this module via pip several times without success. I attached the ERROR log here

"ERROR: Command errored out with exit status 1:
command: /home/pi/py375/bin/python3.7 /home/pi/py375/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpf9arkynj
cwd: /tmp/pip-install-8qybjrwk/streaming-form-data
Complete output (14 lines):
Traceback (most recent call last):
File "/home/pi/py375/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 257, in
main()
File "/home/pi/py375/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 240, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/pi/py375/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py", line 110, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-9n473hi2/overlay/lib/python3.7/site-packages/poetry/masonry/api.py", line 38, in prepare_metadata_for_build_wheel
poetry = Factory().create_poetry(Path("."))
File "/tmp/pip-build-env-9n473hi2/overlay/lib/python3.7/site-packages/poetry/factory.py", line 54, in create_poetry
raise RuntimeError("The Poetry configuration is invalid:\n" + message)
RuntimeError: The Poetry configuration is invalid:
- Additional properties are not allowed ('url' was unexpected)

----------------------------------------

ERROR: Command errored out with exit status 1: /home/pi/py375/bin/python3.7 /home/pi/py375/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpf9arkynj Check the logs for full command output.
"

OS: Ubuntu 18.04
Python version: 3.7.5

smart-open as an optional dependency

I'm packaging streaming-form-data in the AUR. Since 1.12.0, this package depends in smart-open, which unfortunately pulls in boto3, a pretty heavyweight library. Previously, streaming-form-data was very light on dependencies. Would it be possible to make this new dependency optional? Thanks for your work on this library!

How do you read file name?

Is there a way to read file name or content type before creating setting FileTarget?

It does not make treat each incoming stream as text when it could be binary stream most of the time?

How to use it with FastAPI?

I have this

async def foo(
    request: Request,
    file: UploadFile = File(...))
    ...

how to transition to using this library. Currently, the upload of 47mb file takes about 10min, you can read more here

Is this library really that fast? I expect the file to be uploaded in a maximum of 30s, if this library would just improve the upload speed by a few seconds then I won't bother using it.

Flask example file fails to run

$ python3 upload-test.py
Traceback (most recent call last):
  File "upload-test.py", line 10, in <module>
    from streaming_form_data import StreamingFormDataParser
  File "/usr/local/lib/python3.7/dist-packages/streaming_form_data/__init__.py", line 1, in <module>
    from streaming_form_data.parser import (  # NOQA
  File "/usr/local/lib/python3.7/dist-packages/streaming_form_data/parser.py", line 4, in <module>
    from streaming_form_data._parser import ErrorGroup, _Parser  # type: ignore
ImportError: /usr/local/lib/python3.7/dist-packages/streaming_form_data/_parser.so: undefined symbol: _Py_ZeroStruct

I used this:
https://github.com/siddhantgoel/streaming-form-data/blob/master/examples/flask/upload-test.py

ParseException()

Hi!

I'm having trouble parsing a form streamed through uwsgi. I get the error below. I tried to track it down, but the closest I could get was that the byte was not a hyphen. The traceback seems to show it erroring there, but I verified the character being passed in was indeed a hyphen (45). Tried to print debugging info from within the .pyx, but couldn't get messages to show up in the terminal. Never used cython, so not sure of the internals.

Thanks in advance for your help.

~Sean

  File "/usr/local/lib/python2.7/dist-packages/upload/upload.py", line 388, in upload
    parser.data_received(b)
  File "/usr/local/lib/python2.7/dist-packages/streaming_form_data/parser.py", line 53, in data_received
    self._parser.data_received(data)
  File "streaming_form_data/_parser.pyx", line 150, in streaming_form_data._parser._Parser.data_received (streaming_form_data/_parser.c:3929)
  File "streaming_form_data/_parser.pyx", line 172, in streaming_form_data._parser._Parser.data_received (streaming_form_data/_parser.c:3875)
  File "streaming_form_data/_parser.pyx", line 184, in streaming_form_data._parser._Parser._parse (streaming_form_data/_parser.c:4148)
streaming_form_data._parser._Failed

How to use to read image file?

Right now I have a block of code in my flask app:

def read_image(request):
    import pdb; pdb.set_trace()
    file = request.files['image'] #fast_file_read(request) 
    img = Image.open(file.stream)
    return img

The line file = request.files['image'] is taking a long time so I want to replace it with a function fast_file_read(request) that uses your library to speed up this line. Is this possible? I looked at the examples/flask and the best I could come up with is

def fast_file_read(request):
    file_ = FileTarget(os.path.join(tempfile.gettempdir(), 'test'))
    parser = StreamingFormDataParser(headers=request.headers)
    parser.register('file', file_)
    while True:
        chunk = request.stream.read(8192)
        if not chunk:
            break
        parser.data_received(chunk)
    return file_

But it is giving me "AttributeError: 'FileTarget' object has no attribute 'stream'"

Add a filename callback

I'm uploading files and I need the filename that was uploaded. Could you check for a filename method on the Target, and if it exists, call it with the parsed filename? I know it breaks your Target interface, but having the filename is quite important if one is uploading files.

Portion of the code where I think this can happen:

if value.startswith('Content-Disposition') and \
value.endswith('form-data'):
name = params.get('name')
if name:
part = self._part_for(name) or self.default_part
part.start()
self.set_active_part(part)

Example implementation:

if value.startswith('Content-Disposition') and \
		value.endswith('form-data'):
	name = params.get('name')
	if name:
		part = self._part_for(name) or self.default_part
		part.start()
		
		filename = params.get('filename')
		if filename and hasattr(part, 'filename_received'):
			part.filename_received(filename)
			
		self.set_active_part(part)

I think it'd be a useful addition to this library, thanks!

low parsing speed

Hi, I found the speed of the parser to be rather low.

My desktop i7 CPU is parsing only approx 30 MB/s.
And my intranet network is 1 Gbit. So I want this parser to work at least at 100 MB/s.

I will do better benchmark of current solution later.

I investigated the code and I'm sure I can do 100+ MB/s by changing cython code to be more C-like. So it will not read single bytes from Python objects, but will use raw C pointers instead.

Before starting work on pull request I want to ensure

  1. speed improvements are important for you
  2. rewriting cython code to more dangerous C-style code without bounds checking for every memory read operation is acceptable for this project in general

P.S. By the way, internal algorithm of matching boundaries in the stream is well done. It is very effective.

`cgi` is being deprecated in 3.13

https://peps.python.org/pep-0594/#cgi

The official docs suggest to replace cgi.parse_header with email.message.Message:

As an explicit example of how close parse_header and email.message.Message are:

>>> from cgi import parse_header
>>> from email.message import Message
>>> parse_header(h)
('application/json', {'charset': 'utf8'})
>>> m = Message()
>>> m['content-type'] = h
>>> m.get_params()
[('application/json', ''), ('charset', 'utf8')]
>>> m.get_param('charset')
'utf8'

@siddhantgoel I'm happy to take a shot at the PR if you're accepting them!

Question: Is it possible to raise an error when data for a non-registered target is parsed?

Hi first of all thanks for this great library. I'm experimenting with it and it makes a huge difference for us in upload speeds.
Maybe I missed it in documentation but is it possible to raise an exception when data for a non-registered target
is parsed? I assume right now that data is discarded. But I would like to raise an exception, you know maybe
you were writing a client library and typed a field name incorrectly, if this goes silently, user might
fail to see the field name was incorrect etc.

Get file name

Hello,

How is it possible to get the "filename" attribute for a FileTarget?

Thanks!

Python 3 installation issue

When I install streaming-form-data from requirements.txt or as below:

$ python3 -m pip install --user 'streaming-form-data == 1.5.1'

I'm getting the following issue:

$ python3 -c 'import streaming_form_data'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "$HOME/.local/lib/python3.7/site-packages/streaming_form_data/__init__.py", line 1, in <module>
    from streaming_form_data.parser import (  # NOQA
  File "$HOME/.local/lib/python3.7/site-packages/streaming_form_data/parser.py", line 4, in <module>
    from streaming_form_data._parser import ErrorGroup, _Parser  # type: ignore
ImportError: $HOME/.local/lib/python3.7/site-packages/streaming_form_data/_parser.so: undefined symbol: PyUnicodeUCS4_DecodeUTF8

I think the problem is that the shared object ends up being built for python2 (which is also installed).

At the same time, installing by doing git clone + python3 setup.py works fine.

type hints

Hi, I would like to propose adding type hint information to the library.

The question is how it is better to do:

  1. Use standalone .pyi file. It will not modify sources, but will need to duplicate declarations in .pyi.
  2. Embed type information into the code. This may require Python 3.5+ (I'm not sure 100%). But type info will be integrated with the code. No code duplication.

What do you think?

Register multiple targets per same part?

I have a scenario where I would like to capture a file upload and also a SHA256 hash of the SAME part of the request...

        parser = StreamingFormDataParser(headers=request.headers)

        search_document = FileTarget(tmp.name)
        parser.register('SearchDocument', search_document)

        uploaded_file_sha256 = SHA256Target()
        parser.register('SearchDocument', uploaded_file_sha256)

Should this be supported? It seems like the SHA target always has the same value after upload when I try this. I am trying to avoid reading the file a second time from disk to generate a hash.

Thanks, this is a great project.

Python 2.7 support

#1 was due to lack of Python 2.7 support. We do a lot of byte handling and in Cython, so 2.7 support may not be that straight forward, but it would be still nice to have.

Ubuntu Server slow speed

Hi I first tried in MacOs and this is awesome but then I pull my code on Ubuntu and the speed is the same when I don't use this library

handler _parser.data_received failed with delimiting multipart stream into parts

Hi,
I want to use streaming-form-data with socketify.py and i'm getting this error when reading chunks, this is my script:

from socketify import App
from streaming_form_data import StreamingFormDataParser
from streaming_form_data.targets import ValueTarget, FileTarget, NullTarget
async def upload(res, req):
    headers = {'Content-Type': 'multipart/form-data; boundary=boundary'}
    parser = StreamingFormDataParser(headers=headers)
    auth_token = ValueTarget()
    parser.register('auth_token', auth_token)
    parser.register('source',FileTarget('/tmp/file.txt'))
    content_type = req.get_header("content-type")
    print(f"Posted to {req.get_url()} {content_type}")
    def on_data(res, chunk, is_end):
        print(f"Got chunk of data with length {len(chunk)}, is_end: {is_end}")
        parser.data_received(chunk)
        if is_end:
            res.cork_end("Thanks for the data!")

    res.on_data(on_data)

app = App()
app.get("/", lambda res, req: res.end("Hello World socketify from Python!"))
app.post("/", upload)
app.listen(8000, lambda config: print("Listening on port http://localhost:%d now\n" % config.port))
app.run()

and i get this:

Posted to / multipart/form-data; boundary=---------------------------34328379792375695511330156598
Got chunk of data with length 19905, is_end: False
ERROR:root:Error on data handler _parser.data_received failed with delimiting multipart stream into parts
Got chunk of data with length 20480, is_end: False
ERROR:root:Error on data handler _parser.data_received failed with internal errors
....

Any help on this?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.