I'm trying to start a Tornodo server with a pool of 8 threads, here's my code: <di

wsgi: Overloaded thread pool blocks longer than expected about tornado HOT 2 CLOSED

zweifeng1995 commented on July 18, 2024

wsgi: Overloaded thread pool blocks longer than expected

from tornado.

Comments (2)

bdarnell commented on July 18, 2024

This is unexpected, but I think I have an idea what's going on here. I have a version of this test case in https://replit.com/@bdarnell/tornado-gh-3299#main.py in which I have removed multiprocessing, requests, and flask, so Tornado is the only library involved. I see that with a Tornado RequestHandler things work as expected, but with a WSGI app that sleeps we see the behavior in which every request finishes at the same time.

I think this is because the WSGIContainer requires at least two trips through the ThreadPoolExecutor: once for executing the body of the function, and once for each iteration through the returned iterable:

tornado/tornado/wsgi.py

Lines 156 to 174 in ea0b320

    
           app_response = await loop.run_in_executor( 
        
               self.executor, 
        
               self.wsgi_application, 
        
               self.environ(request), 
        
               start_response, 
        
           ) 
        
           try: 
        
               app_response_iter = iter(app_response) 
        
               def next_chunk() -> Optional[bytes]: 
        
                   try: 
        
                       return next(app_response_iter) 
        
                   except StopIteration: 
        
                       # StopIteration is special and is not allowed to pass through 
        
                       # coroutines normally. 
        
                       return None 
        
               while True: 
        
                   chunk = await loop.run_in_executor(self.executor, next_chunk)

This is because we can't know whether the WSGI app is doing everything up front and producing the response all at once, or if it's trying to do some sort of pseudo-async iterable and doing real work after the return. As a result, we get all 50 incoming requests queuing up in the thread pool to execute their first sleep, and then none of them can execute their second trip through the thread pool until after all of the first calls are finished.

This is something of an edge case, since it only occurs if you overload a thread pool. And even then, it replaces one kind of poor performance with another (instead of every request being processed after 7 seconds, you'd see a wave of requests being processed at 1s, a second wave at 2s, etc until some requests are taking 7s or more).

I'm not sure if there's a clean way to distinguish wsgi responses that do real work in a pseudo-async way (and therefore need the use of the thread pool) vs those that do not (and can be iterated on the main thread). We could certainly special-case ordinary list objects, but I'm not sure if that works for flask. We could perhaps have two thread pools for the different phases of the request, but sizing them gets tricky (this is where apple's libdispatch would be useful). Or I suppose we could staple the first iteration of the response to the pre-response function call to cover the common cases.

This issue is specific to WSGI so I'm re-titling it accordingly. Note that I would strongly encourage you to either use Tornado's native interfaces (tornado.web.RequestHandler, etc) on Tornado, or to use Flask or other WSGI frameworks on a WSGI-first server like uwsgi or gunicorn. Using flask via WSGIContainer on Tornado is not a great solution. Prior to Tornado 6.2 it was a really poor solution. Now it's better, but it's still not as good as servers that were built for WSGI from the ground up, and I do not intend to ever turn Tornado into a world-class WSGI server.

from tornado.

zweifeng1995 commented on July 18, 2024

This is unexpected, but I think I have an idea what's going on here. I have a version of this test case in https://replit.com/@bdarnell/tornado-gh-3299#main.py in which I have removed multiprocessing, requests, and flask, so Tornado is the only library involved. I see that with a Tornado RequestHandler things work as expected, but with a WSGI app that sleeps we see the behavior in which every request finishes at the same time.

I think this is because the WSGIContainer requires at least two trips through the ThreadPoolExecutor: once for executing the body of the function, and once for each iteration through the returned iterable:

tornado/tornado/wsgi.py

Lines 156 to 174 in ea0b320

app_response = await loop.run_in_executor(

self.executor,

self.wsgi_application,

self.environ(request),

start_response,

)

try:

app_response_iter = iter(app_response)

def next_chunk() -> Optional[bytes]:

try:

return next(app_response_iter)

except StopIteration:

# StopIteration is special and is not allowed to pass through

# coroutines normally.

return None

while True:

chunk = await loop.run_in_executor(self.executor, next_chunk)

This is because we can't know whether the WSGI app is doing everything up front and producing the response all at once, or if it's trying to do some sort of pseudo-async iterable and doing real work after the return. As a result, we get all 50 incoming requests queuing up in the thread pool to execute their first sleep, and then none of them can execute their second trip through the thread pool until after all of the first calls are finished.

This is something of an edge case, since it only occurs if you overload a thread pool. And even then, it replaces one kind of poor performance with another (instead of every request being processed after 7 seconds, you'd see a wave of requests being processed at 1s, a second wave at 2s, etc until some requests are taking 7s or more).

I'm not sure if there's a clean way to distinguish wsgi responses that do real work in a pseudo-async way (and therefore need the use of the thread pool) vs those that do not (and can be iterated on the main thread). We could certainly special-case ordinary list objects, but I'm not sure if that works for flask. We could perhaps have two thread pools for the different phases of the request, but sizing them gets tricky (this is where apple's libdispatch would be useful). Or I suppose we could staple the first iteration of the response to the pre-response function call to cover the common cases.

This issue is specific to WSGI so I'm re-titling it accordingly. Note that I would strongly encourage you to either use Tornado's native interfaces (tornado.web.RequestHandler, etc) on Tornado, or to use Flask or other WSGI frameworks on a WSGI-first server like uwsgi or gunicorn. Using flask via WSGIContainer on Tornado is not a great solution. Prior to Tornado 6.2 it was a really poor solution. Now it's better, but it's still not as good as servers that were built for WSGI from the ground up, and I do not intend to ever turn Tornado into a world-class WSGI server.

Okay, thank you very much for your patience. I'll try to modify my code and retest it.

from tornado.

wsgi: Overloaded thread pool blocks longer than expected about tornado HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	app_response = await loop.run_in_executor(
	self.executor,
	self.wsgi_application,
	self.environ(request),
	start_response,
	)
	try:
	app_response_iter = iter(app_response)

	def next_chunk() -> Optional[bytes]:
	try:
	return next(app_response_iter)
	except StopIteration:
	# StopIteration is special and is not allowed to pass through
	# coroutines normally.
	return None

	while True:
	chunk = await loop.run_in_executor(self.executor, next_chunk)