Coder Social home page Coder Social logo

Comments (9)

fractaledmind avatar fractaledmind commented on June 1, 2024

Here's code from the youtube-dl project that checks for gzipped byte marker:

if len(video_webpage) > 2 and video_webpage[0] == '\x1f' and video_webpage[1] == '\x8b':
    buf = StringIO.StringIO(video_webpage)
    f = gzip.GzipFile(fileobj=buf)
    video_webpage = f.read()

Here's that applied to workflow.web.Response.content:

@property
def content(self):
    """Raw content of response (i.e. bytes)

    :returns: Body of HTTP response
    :rtype: :class:`str`

    """

    if not self._content:
        self._content = self.raw.read()
        if len(self._content) > 2 and (self._content[0] == '\x1f' and
                                       self._content[1] == '\x8b'):
            buf = StringIO.StringIO(self._content)
            gzip_f = gzip.GzipFile(fileobj=buf)
            self._content = gzip_f.read()

    return self._content

from alfred-workflow.

deanishe avatar deanishe commented on June 1, 2024

I'll look into it.

Why aren't you using the solution from the chosen answer?

If the data is gzipped for transmission, the Content-Encoding header should reflect this.

Your proposed solution (checking the first byte) has the same problem as your proposed adaptation to decode() to handle mdls output: you're wrongly generalising a specific egde case, and it will probably break other cases.

It's perfectly possible that a binary file starts with the same byte, but your solution would incorrectly try to unzip it.

That's fine for youtube-dl because it only handles a very limited number of file formats. It is not appropriate in the general case.

from alfred-workflow.

fractaledmind avatar fractaledmind commented on June 1, 2024

Good point. I had read the youtube-dl solution last, so that's what stuck in my mind, but checking the header is clearly better. Good call.

Here's a second run at it:

import gzip
from cStringIO import StringIO

@property
def content(self):
    """Raw content of response (i.e. bytes)

    :returns: Body of HTTP response
    :rtype: :class:`str`

    """

    if not self._content:
        self._content = self.raw.read()
        if self.headers['content-encoding'] == 'gzip':
            inbuffer = StringIO(self._content)
            f = gzip.GzipFile(mode='rb', fileobj=inbuffer)
            try:
                self._content = f.read()
            finally:
                f.close()
    return self._content

from alfred-workflow.

deanishe avatar deanishe commented on June 1, 2024

Looks like a fair start.

iter_content() would also need to support gzip encoding and the library would have to send an Accept-Encoding header that contains gzip in order to be correct.

I've had a little play with GzipFile and this looks tricky to do.

from alfred-workflow.

deanishe avatar deanishe commented on June 1, 2024

So, I've had a look at this, and it doesn't look like it's possible to decompress a stream of gzipped data, at least, not without piping it through gunzip or keeping all the data in memory, which is what iter_content() is supposed to avoid.

Which website were you pulling the data from? A webserver shouldn't be sending gzipped data if the HTTP client hasn't explicitly told the server that it supports it (which web.py doesn't).

from alfred-workflow.

fractaledmind avatar fractaledmind commented on June 1, 2024

It's the search results from kickasss torrents.

from alfred-workflow.

deanishe avatar deanishe commented on June 1, 2024

Hmm. That's actually naughty behaviour by the server. It shouldn't be sending gzipped data unless the client has told it that it accepts that.

Currently, I'm inclined to leave gzip-handling out of web.py.

It's a good feature to have, but I don't want to have Response.content support it but not Response.iter_content().

I didn't put too much effort into it, but it looks like piping the data through gunzip is the only thing likely to work (Python seems to only be able to decompress the data if it's all loaded into memory or saved to a temporary file).

from alfred-workflow.

deanishe avatar deanishe commented on June 1, 2024

I've had an idea regarding iter_content() and gzipped content.

The way I see it, streaming HTTP data isn't so useful in a workflow; the main selling point of iter_content() is handling large files without having to load them into memory.

So, could it be replaced with a save_to_path() function? This could first save to a temporary file, which is then moved/renamed to the destination file upon completion, and decompressed if necessary.

How does that sound?

I should probably make this a separate issue…

from alfred-workflow.

deanishe avatar deanishe commented on June 1, 2024

Gzip support and save_to_path() implemented in v1.9.6

from alfred-workflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.