gevent.pywsgi's File Descriptor Leak and HTTP Keep-Alive
22/Dec 2017Background
My server uses gevent.pywsgi. It works fine. However, every other few days the server will stop responding to requests. It says “Too many open files” in the logs.
Investigation
A simple lsof
is showing that there are many socket connections opened by pywsgi even when those sessions are completed. This FD (File Descriptor) leak probably causes the process to reach the ulimit -n
per-process number of open files limit.
HTTP 1.1 is used and Keep-Alive (a.k.a HTTP persistent connection) is by default on. Our app clients are not smart enough (and we should never assume they are) that they never send Connection: close
. And on the server side, gevent.pywsgi does not implement any server-side keep-alive timeout. Therefore, if the client does not actively send a Connection: close
, the socket on the server side will keep open forever. That is obviously a bug (a.k.a ‘undocumented feature’).
In the gevent doc, it says:
This method runs a request handling loop, calling handle_one_request() until all requests on the connection have been handled (that is, it implements keep-alive).
The truth is, its implementation of keep-alive
is not good enough.
Solution
The solution is found inside gevent’s GitHub issue. The code looks like this:
class MyHandler(WSGIHandler):
def handle(self):
timeout = gevent.Timeout.start_new(my_timeout)
try:
WSGIHandler.handle(self)
except gevent.Timeout as ex:
if ex is timeout:
# We timed out, take appropriate action.
# NOTE: The socket is already closed at this point
pass
else:
raise
It simply works.
Solution 2
The better solution, imo, and stated in the GitHub issue, is to have a proper reverse proxy server (e.g. nginx, HAProxy) in front of the WSGI server such that they will close the connections for you.
Other than better management of persistent connections, a reverse proxy server can also perform SSL termination, load balancing, buffering, etc. Lots of good stuff. You should always have one of these.