Well here's another edge-case... It hit a problem using <code class=

Here's code from the <a href="https://github.com/rg3/youtube-dl/issues/366" data-hover

Looks like a fair start. iter_content()</c

Gzip support and save_to_path() implemented in <a hre

Handling gzipped content in workflow.web about alfred-workflow HOT 9 CLOSED

deanishe commented on June 1, 2024

Handling gzipped content in workflow.web

from alfred-workflow.

Comments (9)

$fractaledmind avatar$ fractaledmind commented on June 1, 2024

Here's code from the youtube-dl project that checks for gzipped byte marker:

if len(video_webpage) > 2 and video_webpage[0] == '\x1f' and video_webpage[1] == '\x8b':
    buf = StringIO.StringIO(video_webpage)
    f = gzip.GzipFile(fileobj=buf)
    video_webpage = f.read()

Here's that applied to workflow.web.Response.content:

@property
def content(self):
    """Raw content of response (i.e. bytes)

    :returns: Body of HTTP response
    :rtype: :class:`str`

    """

    if not self._content:
        self._content = self.raw.read()
        if len(self._content) > 2 and (self._content[0] == '\x1f' and
                                       self._content[1] == '\x8b'):
            buf = StringIO.StringIO(self._content)
            gzip_f = gzip.GzipFile(fileobj=buf)
            self._content = gzip_f.read()

    return self._content

from alfred-workflow.

deanishe commented on June 1, 2024

I'll look into it.

Why aren't you using the solution from the chosen answer?

If the data is gzipped for transmission, the Content-Encoding header should reflect this.

Your proposed solution (checking the first byte) has the same problem as your proposed adaptation to decode() to handle mdls output: you're wrongly generalising a specific egde case, and it will probably break other cases.

It's perfectly possible that a binary file starts with the same byte, but your solution would incorrectly try to unzip it.

That's fine for youtube-dl because it only handles a very limited number of file formats. It is not appropriate in the general case.

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on June 1, 2024

Good point. I had read the youtube-dl solution last, so that's what stuck in my mind, but checking the header is clearly better. Good call.

Here's a second run at it:

import gzip
from cStringIO import StringIO

@property
def content(self):
    """Raw content of response (i.e. bytes)

    :returns: Body of HTTP response
    :rtype: :class:`str`

    """

    if not self._content:
        self._content = self.raw.read()
        if self.headers['content-encoding'] == 'gzip':
            inbuffer = StringIO(self._content)
            f = gzip.GzipFile(mode='rb', fileobj=inbuffer)
            try:
                self._content = f.read()
            finally:
                f.close()
    return self._content

from alfred-workflow.

deanishe commented on June 1, 2024

Looks like a fair start.

iter_content() would also need to support gzip encoding and the library would have to send an Accept-Encoding header that contains gzip in order to be correct.

I've had a little play with GzipFile and this looks tricky to do.

from alfred-workflow.

deanishe commented on June 1, 2024

So, I've had a look at this, and it doesn't look like it's possible to decompress a stream of gzipped data, at least, not without piping it through gunzip or keeping all the data in memory, which is what iter_content() is supposed to avoid.

Which website were you pulling the data from? A webserver shouldn't be sending gzipped data if the HTTP client hasn't explicitly told the server that it supports it (which web.py doesn't).

from alfred-workflow.

$fractaledmind avatar$ fractaledmind commented on June 1, 2024

It's the search results from kickasss torrents.

from alfred-workflow.

deanishe commented on June 1, 2024

Hmm. That's actually naughty behaviour by the server. It shouldn't be sending gzipped data unless the client has told it that it accepts that.

Currently, I'm inclined to leave gzip-handling out of web.py.

It's a good feature to have, but I don't want to have Response.content support it but not Response.iter_content().

I didn't put too much effort into it, but it looks like piping the data through gunzip is the only thing likely to work (Python seems to only be able to decompress the data if it's all loaded into memory or saved to a temporary file).

from alfred-workflow.

deanishe commented on June 1, 2024

I've had an idea regarding iter_content() and gzipped content.

The way I see it, streaming HTTP data isn't so useful in a workflow; the main selling point of iter_content() is handling large files without having to load them into memory.

So, could it be replaced with a save_to_path() function? This could first save to a temporary file, which is then moved/renamed to the destination file upon completion, and decompressed if necessary.

How does that sound?

I should probably make this a separate issue…

from alfred-workflow.

deanishe commented on June 1, 2024

Gzip support and save_to_path() implemented in v1.9.6

from alfred-workflow.

Handling gzipped content in workflow.web about alfred-workflow HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent