Coder Social home page Coder Social logo

Comments (3)

mross22 avatar mross22 commented on August 18, 2024

Can you give some more details on what wrapping you have to do in order to 'fool' requests into thinking it is a stream and what the stream-like object looks like that you are getting back from the AWS SDK? I think I understand the problem you're getting at but ideally we wouldn't have to add any code in our SDK that is based on requests internals. Regardless of where the wrapping code goes it would be nice if it was done in a way more explicitly supported by requests as opposed to being dependent on their internal implementation.

from oci-python-sdk.

brunson avatar brunson commented on August 18, 2024

Take a look at this code in the requests package at the version you have pinned (https://github.com/requests/requests/blob/v2.11.1/requests/models.py#L472)

You can see that requests is trying to infer if the object it received is file-like in order to get the content-length. If the object has a seek and a tell attribute it assumes it can seek to the end of the file to get the length. The AWS client response from boto2.Client('s3').get_object() includes a Body with a _raw_stream attribute which can be read() from. However, it also has seek() and tell() methods, but the seek() method throws an exception if called to the tune of "method unimplemented".

We tried several other ways of passing in the AWS response, but they all either failed or else succeeded in seek()ing to the end of the "file" and reading the entire content into memory. This is prohibitive since we're trying to move files in excess of 20GB at times.

My workaround was to runtime patch the _raw_stream object with a seek() that is a No Op lambda, then to patch the tell() method with a class that will return zero the first time it is call()'ed and the actual value of the Content-Length retrieved from the AWS client subsequently. This is what I mean by "fooling" the requests module into getting the correct content length without actually reading the entire body of the response into memory.

The underlying issue is in the requests module and its mechanism of inferring the capabilities of the object it's passed. Perhaps it won't fail in this manner when you move to more recent releases as the implementation has changed considerably, but this seems like an obvious place where we need to be interacting with the requests API in a manner that can reliably stream data without having to store it locally.

I'd like to hear your thoughts on the matter.

from oci-python-sdk.

brunson avatar brunson commented on August 18, 2024

@mross22 Thanks for the assistance.

In summary, your recommendation of wrapping the AWS request in a class that hides the seek() and tell() methods is a bit cleaner than my solution. Looking at the super_len() function of requests as us suggested also indicates that defining len() or len() in the class will also cause requests to properly fetch the content-length without trying to read the entire stream into memory.

I appreciate your help in finding a more reliable work around.

Thanks,
e.

from oci-python-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.