Fixing boto (AWS Python interface) Quadratic Runtime Connection Pool

Background

I come across some Python code using boto, the Python interface to AWS. There’s boto3, which is a newer version of boto. If you are starting a new project, you should be looking at boto3 instead of boto.

I am told that boto pools connections. This is a good thing in terms of performance. Never waste HTTPS connections because it is expensive to setup. But does it limit the number of outgoing connections like SQLAlchemy’s QueuePool, or is it configurable in terms of number of pooled connections? I’m not sure. It is always good to read the code.

Diving into the code

Here is the implementation of the connection pool. There is a HostConnectionPool for each (host, port, secure) tuple, which makes sense.

Let’s look into the code.

class HostConnectionPool(object):
    def __init__(self):
        self.queue = []

Wait. self.queue is a Python list. Python list is not a good queue implementation because list.append(obj) is O(1) but list.pop(0) is O(N). It works better as a stack.

    def get(self):
        """
        Returns the next connection in this pool that is ready to be
        reused.  Returns None if there aren't any.
        """
        # Discard ready connections that are too old.
        self.clean()

        # Return the first connection that is ready, and remove it
        # from the queue.  Connections that aren't ready are returned
        # to the end of the queue with an updated time, on the
        # assumption that somebody is actively reading the response.
        for _ in range(len(self.queue)):
            (conn, _) = self.queue.pop(0)
            if self._conn_ready(conn):
                return conn
            else:
                self.put(conn)
        return None
    
    def clean(self):
        """
        Get rid of stale connections.
        """
        # Note that we do not close the connection here -- somebody
        # may still be reading from it.
        while len(self.queue) > 0 and self._pair_stale(self.queue[0]):
            self.queue.pop(0)

self.queue.pop(0) is not good. Doing it inside a loop is even worse. No one would expect HostConnectionPool.get() is O(N^2). It should be O(1).

An attempt to fix

The fix is very straightforward. Replace list with collections.deque and replace all self.queue.pop(0) with self.queue.popleft(). WARNING: this code still has quadratic runtime, I’ll explain later.

    def __init__(self):
        self.queue = deque()

    def get(self):
        self.clean()

        for _ in range(len(self.queue)):
            (conn, _) = self.queue.popleft()
            if self._conn_ready(conn):
                return conn
            else:
                self.put(conn)
        return None

    def clean(self):
        while len(self.queue) > 0 and self._pair_stale(self.queue[0]):
            self.queue.popleft()

Double check

It is tempting to fix it and never look at it again. Don’t do that. Instead, run some benchmarks to confirm the bug is fixed. Note that xrange() is used because this is Python 2.7.

import time
from boto.connection import HostConnectionPool


def f():
    p = HostConnectionPool()
    count = 10000
    for _ in xrange(count):
        p.put(None)
    for _ in xrange(count):
        assert p.size() == count - _
        p.get()


t = time.time()
f()
print(time.time() - t)

As shocking as it is, the difference is insignificant. I wrote another script to compare list.pop(0) and deque.popleft(), and the latter outperformed by an order of magnitude. Something is wrong.

The final fix

I take a closer look at the code. range() in line for _ in range(len(self.queue)) is bad. In Python 2, range() generates a list as opposed to an iterator in Python 3. In short, range(N) is O(N) in Python 2. The fix is to use six.moves.range() to maintain Python 2 and 3 compatibility.

Here’s how the final code looks like:

    def get(self):
        self.clean()

        for _ in six.moves.range(len(self.queue)):
            (conn, _) = self.queue.popleft()
            if self._conn_ready(conn):
                return conn
            else:
                self.put(conn)
        return None

Benchmark again

Using the same benchmarking code, here’s the runtime (in seconds).

Before fix: 0.27458691597
After fix: 0.0182430744171

Takeaway

  1. Never blindly trust libraries. Look at the source.
  2. Always confirm whether the fix works. Write a test or a benchmarking script.

Reference

  1. Complete, patched version of connection.py