Coder Social home page Coder Social logo

Comments (8)

spyoungtech avatar spyoungtech commented on August 16, 2024 2

@qo4on I had a thought about this over the weekend. I wonder if your problem would be solved by avoiding using a session altogether -- As in, creating a new session for every request.

My theory is that interleaved requests using the same underlying TCP connection are causing your problem. By default, when you use a session, the TCP connections will be reused. If you do not use a session object, a new session will be created for each request, which should also mean a new TCP connection will be created for each request.

You will have to pass in any necessary headers/cookies to each request, but I think this may help you send multiple requests concurrently, instead of needing to do them all serially.

from grequests.

qo4on avatar qo4on commented on August 16, 2024 1

Thank you for your help. It looks like you are right. Their cloud drive works reliable only with sequential requests. I tried multi processes, threads, asyncio + requests, http.client + pipeline, dugong, grequests, coroutines... It mostly freezes at some point or sends incorrect responses. I have no idea what they mean by http pipelining.
Unfortunately other services are even worse.

from grequests.

qo4on avatar qo4on commented on August 16, 2024 1

Thanks. I wrote a message to their support team and they fixed some errors. Finally this code works. I have no idea why it works, but it works, it returns fd = [1, 2, 3, 4, 5...] as it should. I tried to do the same with httpx and had incorrect values for all files fd = [1, 1, 1, 1, 1...].

from grequests.

spyoungtech avatar spyoungtech commented on August 16, 2024

grequests supports all the usual features of requests. My understanding is that keep-alive is on by default for sessions. If you provide a session object to the grequests request, it will use that session for its connection, so you can use the same connection pool across requests sent with grequests.

I think this will do what you want @qo4on

import grequests
import requests
sesh = requests.Session()
import logging
logging.basicConfig(level=logging.DEBUG)  # to see the connection logs
request_list = [grequests.get('http://httpbin.org/status/200', session=sesh) for _ in range(10)]

for resp in grequests.imap(request_list):
    ...

If you don't use a session, you'll see that a new connection is made for every request. You can also further configure the connection pooling by configuring your session object accordingly.

from grequests.

qo4on avatar qo4on commented on August 16, 2024

@spyoungtech Thank you. I'm not sure that the same session means the same tcp connection. In my view grequests does not guarantee that the order it sends the requests is the same as the order of remote server gets these requests. That means grequests uses different tcp connection for every request. Am I right?

Also, I don't know why, but my authorization fails after a few requests with grequests. Usual requests.Session() does not have such problem.

request_list = []
for item in items[:10]:
    params = {'path': f"/{self.tts}/{self.theme}/{item['name']}",
              'flags': self.pc.O_CREAT}
    request_list.append(grequests.get('https://eapi.pcloud.com/file_open', session=self.pc.session, params=params))

for resp in grequests.imap(request_list):
    print(resp.json())

{'result': 0, 'fd': 1, 'fileid': 85081146}
{'result': 0, 'fd': 2, 'fileid': 85081155}
{'result': 0, 'fd': 3, 'fileid': 85081252}
{'result': 0, 'fd': 4, 'fileid': 85081157}
{'result': 1000, 'error': 'Log in required.'}
{'result': 1000, 'error': 'Log in required.'}
{'result': 1000, 'error': 'Log in required.'}
{'result': 1000, 'error': 'Log in required.'}
{'result': 1000, 'error': 'Log in required.'}
{'result': 0, 'fd': 5, 'fileid': 85081255}

{'result': 0, 'fd': 1, 'fileid': 85081146}
{'result': 0, 'fd': 2, 'fileid': 85081155}
{'result': 0, 'fd': 3, 'fileid': 85081252}
{'result': 0, 'fd': 4, 'fileid': 85081157}
{'result': 0, 'fd': 5, 'fileid': 85081255}
{'result': 0, 'fd': 6, 'fileid': 85081158}
{'result': 0, 'fd': 7, 'fileid': 85081259}
{'result': 0, 'fd': 8, 'fileid': 85081159}
{'result': 0, 'fd': 9, 'fileid': 85081262}
{'result': 1000, 'error': 'Log in required.'}

from grequests.

spyoungtech avatar spyoungtech commented on August 16, 2024

I think I see what you mean now. The core underlying library, urllib3 (and by extension requests) does not support HTTP pipelining, therefore grequests does not support this. However, urrlib3's connection pooling is likely to already be giving you similar, if not better, performance gains.

grequests does not guarantee that the order it sends the requests

No, I don't believe there are any guarantees of the order in which requests are sent, at least not when using grequests.map/grequests.imap. This is, in part, an inescapable nature of handling multiple requests/responses concurrently.

That means grequests uses different tcp connection for every request.

Like in requests, the behavior of creating connections and connection pooling is handled in urllib3. Using the code in my first comment, you might see in the debug logs that urllib3 ends up creating two HTTP connections and uses/reuses them for all 10 requests. It does not create a new connection for each request when using a session.

DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org:80
DEBUG:urllib3.connectionpool:Starting new HTTP connection (2): httpbin.org:80

You can configure the underlying connection pool size. See also: urllib3 advanced usage.

However, in order to send multiple requests concurrently, you need at least two connections! Trying to limit the pool to 1 and exactly 1 connection will cause a deadlock waiting for a connection in the pool to become available that never becomes available.

In this example, you'll see only one HTTPS connection is created.... however, it's kind of useless because the program deadlocks.

import grequests
import requests
sesh = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_maxsize=1, pool_block=True)
sesh.mount('https://', adapter)
import logging
logging.basicConfig(level=logging.DEBUG)  # to see the connection logs
request_list = [grequests.get('https://httpbin.org/status/200', session=sesh) for _ in range(10)]

for resp in grequests.imap(request_list):
    # DEADLOCK
    ...

Basically, it seems that when you have a blocking pool (see urllib3 docs linked above) with just 1 available connection, you get a deadlock with grequests. Essentially, you need at least two connections to prevent the deadlock.

urllib3 fulfills the entire request -> response cycle; you can't send subsequent requests on the same connection without first receiving a response. Instead, connections are reused for subsequent requests (and responses). But even when reusing the same TCP connection, it's not quite the same as HTTP pipelining. Though, as mentioned, I don't think you'll see much performance gains, if any, with pipelining compared to urllib 3's connection pooling.

from grequests.

qo4on avatar qo4on commented on August 16, 2024

The problem is that the server does not support multiple connections https://docs.pcloud.com/protocols/http_json_protocol/single_connection.html
Sending requests in a for loop works great but takes too long. Multiple processes/threads are not supported.
Is there any workaround?

from grequests.

spyoungtech avatar spyoungtech commented on August 16, 2024

I don't think that document is suggesting the server does not support multiple connections or that threading your requests is prohibited.

By my reading, the document is suggesting to use a single connection for performance, but does not require it. The comment about threads/processes is regarding having different threads handling different requests writing to the same connection, which doesn't happen in the case of urllib3/requests/grequests.

Some quotes from that page with emphasis added:

"You can push multiple requests over single connection without waiting for answer, to improve performance

"However you should make sure that in no event two threads/processes write to the same connection at the same time."

Is there any workaround?

I believe grequests is working for your use case, based on the output you described (aside from the auth issue). Maybe adjusting the session adapter pool settings or the gevent pool size (via the size argument to imap) will help you achieve better results.

I'm not sure why you're having that authentication issue, I'd have to look closer around the authentication and know how you're handling that. Are you using the auth tokens with session cookies or some other method?

As a personal note, their documentation leaves much to be desired and does not inspire confidence in their service... some of the design there, particularly around authentication, is also kind of a big 'yikes' for me 😬

from grequests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.