Coder Social home page Coder Social logo

Comments (3)

ross avatar ross commented on August 19, 2024

because I request images and I don't want to use too much memory.

holding on to the requests won't really change the amount of memory being used, the response data will still be downloaded and loaded in to memory in the worker threads. the easiest thing to do would be to hold on to the futures and check on the status there. i'm not really clear on the use-case of requesting a lot of images, but not caring about the results would be, but http HEAD requests might be a good idea here if it works.

from requests-futures.

Fman77 avatar Fman77 commented on August 19, 2024

Unfortunately I cannot use HEAD requests as this is not implemented on the server I'm requesting.
Anyway as you suggested I tried to save all futures in a list and then after my for loop which submits all the requests, going through this list and perform a .result() on each future. I have this error : requests.exceptions.SSLError: [Errno 24] Too many open files, OR requests.exceptions.ConnectionError: ('Connection aborted.', ResponseNotReady('Request-sent',)). Do you have any idea where it can come from ?
By the way I cannot check response if I have a timeout, because if the server timeouts I won't have any response, I have to put a try/except block with TimeoutError exception

To sum up my code I have something like that :

URLS=[list of URLS]
session = FuturesSession(max_workers=200)

for url in URLS
futures.append(session.get(url))

for future in futures:
try:
future.result(timeout=10)
except OSError.TimeoutError:
print("Request timed out")

What is strange is that it works with only 100 requests, but when I do more requests I have the Too many open files error. Do I have to close or clean something ?

from requests-futures.

ross avatar ross commented on August 19, 2024

requests.exceptions.SSLError: [Errno 24] Too many open files

my guess would be that requests is dumping the image data to tmp files on disk rather than holding it all in memory. you can probably increase the ulimit for the user running the script to a point to prevent it from hitting that limit. the response not ready stuff may be similar/related.

By the way I cannot check response if I have a timeout, because if the server timeouts I won't have any response, I have to put a try/except block with TimeoutError exception

if the server times out there isn't a response to check, the request just failed and didn't get one in the allowed time.

What is strange is that it works with only 100 requests, but when I do more requests I have the Too many open files error. Do I have to close or clean something ?

beyond raising the ulimit to allow more open files you might try getting rid of the responses once you're done with them rather than keeping them around in the futures array. something like

while futures:
    future = futures.pop(0)  # pop() would be fine if you don't care about order
    # check the responses the same as you otherwise would

you have a pretty odd use-case here, requesting a large number of images, but not caring about the resulting data. you're also sending off "all" of the requests before checking on any of the responses which would result in buffering everything in to the program at once.

you aren't really in the designed uses for requests-futures and could probably get a lot further by doing the multi-threading yourself:

#!/usr/bin/env python

from Queue import Queue
from requests import Session
from threading import Thread
from time import sleep
import logging

# using logger so that prints aren't interleaved from multiple threads
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger()


class Worker(Thread):

  def __init__(self, queue):
    super(Worker, self).__init__()
    self.queue = queue
    self.session = Session()
    self.start()

  def run(self):
    while True:
      url = queue.get()
      # run until we get a None url, our signal to stop
      if url is None:
        break
      logger.debug(url)
      resp = self.session.get(url)
      # do whatever checking you want to do here


num_workers = 20
num_urls = 30

queue = Queue()
logger.info("creating workers")
workers = [Worker(queue) for i in xrange(num_workers)]

logger.info("enqueing urls")
for i in xrange(num_urls):
  queue.put('http://www.foo.bar/{}'.format(i))

logger.info("waiting for queue to empty")
while not queue.empty():
  sleep(1)

logger.info("signaling workers to stop")
# enqueue enough None jobs to stop all the workers
for worker in workers:
  queue.put(None)

logger.info("waiting on workers to finish")
for worker in workers:
  # wait for it to stop
  worker.join()

from requests-futures.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.