I come across some Python code using boto, the Python interface to AWS. There’s boto3, which is a newer version of boto. If you are starting a new project, you should be looking at boto3 instead of boto.
I am told that boto pools connections. This is a good thing in terms of performance. Never waste HTTPS connections because it is expensive to setup. But does it limit the number of outgoing connections like SQLAlchemy’s QueuePool, or is it configurable in terms of number of pooled connections? I’m not sure. It is always good to read the code.
Diving into the code
Here is the implementation of the connection pool. There is a
HostConnectionPool for each (host, port, secure) tuple, which makes sense.
Let’s look into the code.
class HostConnectionPool(object): def __init__(self): self.queue = 
self.queue is a Python list. Python list is not a good queue implementation because
list.append(obj) is O(1) but
list.pop(0) is O(N). It works better as a stack.
def get(self): """ Returns the next connection in this pool that is ready to be reused. Returns None if there aren't any. """ # Discard ready connections that are too old. self.clean() # Return the first connection that is ready, and remove it # from the queue. Connections that aren't ready are returned # to the end of the queue with an updated time, on the # assumption that somebody is actively reading the response. for _ in range(len(self.queue)): (conn, _) = self.queue.pop(0) if self._conn_ready(conn): return conn else: self.put(conn) return None def clean(self): """ Get rid of stale connections. """ # Note that we do not close the connection here -- somebody # may still be reading from it. while len(self.queue) > 0 and self._pair_stale(self.queue): self.queue.pop(0)
self.queue.pop(0) is not good. Doing it inside a loop is even worse. No one would expect
HostConnectionPool.get() is O(N^2). It should be O(1).
An attempt to fix
The fix is very straightforward. Replace list with collections.deque and replace all
self.queue.popleft(). WARNING: this code still has quadratic runtime, I’ll explain later.
def __init__(self): self.queue = deque() def get(self): self.clean() for _ in range(len(self.queue)): (conn, _) = self.queue.popleft() if self._conn_ready(conn): return conn else: self.put(conn) return None def clean(self): while len(self.queue) > 0 and self._pair_stale(self.queue): self.queue.popleft()
It is tempting to fix it and never look at it again. Don’t do that. Instead, run some benchmarks to confirm the bug is fixed. Note that
xrange() is used because this is Python 2.7.
import time from boto.connection import HostConnectionPool def f(): p = HostConnectionPool() count = 10000 for _ in xrange(count): p.put(None) for _ in xrange(count): assert p.size() == count - _ p.get() t = time.time() f() print(time.time() - t)
As shocking as it is, the difference is insignificant. I wrote another script to compare
deque.popleft(), and the latter outperformed by an order of magnitude. Something is wrong.
The final fix
I take a closer look at the code.
range() in line
for _ in range(len(self.queue)) is bad. In Python 2,
range() generates a list as opposed to an iterator in Python 3. In short,
range(N) is O(N) in Python 2. The fix is to use
six.moves.range() to maintain Python 2 and 3 compatibility.
Here’s how the final code looks like:
def get(self): self.clean() for _ in six.moves.range(len(self.queue)): (conn, _) = self.queue.popleft() if self._conn_ready(conn): return conn else: self.put(conn) return None
Using the same benchmarking code, here’s the runtime (in seconds).
Before fix: 0.27458691597 After fix: 0.0182430744171
- Never blindly trust libraries. Look at the source.
- Always confirm whether the fix works. Write a test or a benchmarking script.