10x Faster Python gevent Redis Connection Pool

Background

Observation

I notice a lot of "No connection available." logs from Redis Python client redis-py BlockingConnectionPool. Looking into the source of redis-py:

try:
    connection = self.pool.get(block=True, timeout=self.timeout)
except Empty:
    # Note that this is not caught by the redis client and will be
    # raised unless handled by application code. If you want never to
    raise ConnectionError("No connection available.")

It happens when there is no connection in the pool available. Something’s not fast enough.

Environment

python --version:

Python 2.7.12

pip freeze:

gevent==1.3.3
greenlet==0.4.13
hiredis==0.2.0
redis==2.10.6

redis server runs in a docker container using the official image. To start the container: sudo docker run --name some-redis -p 6379:6379 -d redis

redis-cli info server inside the container:

# Server
redis_version:4.0.9
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:d3ebfc7feabc1290
redis_mode:standalone
os:Linux 4.13.0-37-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:4.9.2
process_id:1
run_id:60e96e45fadb9a8dd8dd4e36553589033cb85a04
tcp_port:6379
uptime_in_seconds:3814
uptime_in_days:0
hz:10
lru_clock:1744601
executable:/data/redis-server
config_file:

Investigation

With gevent, the first thing that comes up to my mind is whether redis-py is gevent-compatible. socket is monkeypatched by gevent so at least the network part should be fine.

How about the queue? It turns out that self.pool is a Queue.LifoQueue instead of a counterpart in gevent.queue. Is it gevent-compatible? Yes, because it uses threading.Condition and threading.Lock and that the lock is monkeypatched.

But is the monkeypatched lock gevent-friendly enough? Probably no.

Proposed fix

Let’s see if setting queue_class to gevent queue in the constructor of BlockingConnectionPool will help. The proposed fix looks like this:

from gevent.queue import LifoQueue
pool = BlockingConnectionPool(queue_class=LifoQueue)

Benchmarking

Of course benchmarking is needed to confirm whether the proposed fix works.

Code

gevent monkeypatching is a must. Extra care is taken to exclude the cost of spawning greenlets. Also, all the greenlets must die before starting another pass of the benchmark to avoid any interference.

from gevent import monkey
monkey.patch_all()

from redis import BlockingConnectionPool, StrictRedis
import gevent
from gevent.queue import LifoQueue
import time
from gevent.pool import Group

# Timeout of getting a connection from pool
# Exceeding the timeout counts as a failure
WAIT_TIMEOUT = 1

# Number of concurrent producers
GREENLET_CNT = 5000

def f(p):
    global fail, ok, greenlet_cnt, done

    # Give way to the spawning greenlets
    gevent.sleep(0.01)
    r = StrictRedis(connection_pool=p)
    greenlet_cnt += 1
    while not done:
        try:
            r.ping()
            ok += 1
        except:
            fail += 1

for max_connections in [1, 10, 100]:
    blocking_pool = BlockingConnectionPool(max_connections=max_connections, timeout=WAIT_TIMEOUT)
    gevent_blocking_pool = BlockingConnectionPool(max_connections=max_connections, timeout=WAIT_TIMEOUT,
                                                  queue_class=LifoQueue)
    for name, pool in [('blocking_pool', blocking_pool), ('gevent_blocking_pool', gevent_blocking_pool)]:
        fail = 0
        ok = 0
        greenlet_cnt = 0
        done = False

        jobs = Group()
        for _ in range(GREENLET_CNT):
            jobs.spawn(f, pool)

        # Make sure all of them spawns
        # to eliminate the cost of spawning greenlets
        while greenlet_cnt < GREENLET_CNT:
            gevent.sleep(0.1)

        # Reset counters when all the greenlets are spawned
        fail = 0
        ok = 0

        t = time.time()
        jobs.join(timeout=60)
        elapsed = time.time() - t

        print('pool: %s, connections made: %d' % (name, len(pool._connections)))
        print('total attempts: %d (fail: %d / ok: %d)' % (fail + ok, fail, ok))
        print('success rate: %s%%' % (ok * 100.0 / (ok + fail)))
        print('time elapsed: %ss' % (elapsed))
        print('success throughput: %s req/s' % (ok / elapsed))
        print('----\n\n')

        # Make sure all of the greenlets die
        done = True
        jobs.join()

Output

pool: blocking_pool, connections made: 1
total attempts: 326950 (fail: 295823 / ok: 31127)
success rate: 9.52041596574%
time elapsed: 60.0029239655s
success throughput: 518.758052823 req/s
----


pool: gevent_blocking_pool, connections made: 1
total attempts: 818995 (fail: 297846 / ok: 521149)
success rate: 63.6327450107%
time elapsed: 60.0001549721s
success throughput: 8685.79423241 req/s
----


pool: blocking_pool, connections made: 10
total attempts: 382019 (fail: 294557 / ok: 87462)
success rate: 22.8946727781%
time elapsed: 60.0312950611s
success throughput: 1456.94008285 req/s
----


pool: gevent_blocking_pool, connections made: 10
total attempts: 1447491 (fail: 296746 / ok: 1150745)
success rate: 79.4992853151%
time elapsed: 60.0006120205s
success throughput: 19178.8877021 req/s
----


pool: blocking_pool, connections made: 100
total attempts: 451850 (fail: 285101 / ok: 166749)
success rate: 36.9036184575%
time elapsed: 60.0642080307s
success throughput: 2776.17911677 req/s
----


pool: gevent_blocking_pool, connections made: 100
total attempts: 1401844 (fail: 289100 / ok: 1112744)
success rate: 79.3771632222%
time elapsed: 60.0090129375s
success throughput: 18542.9478928 req/s
----

Interpretation

According to the benchmarking results, there is a ~10x performance gain. There are more attempts to get a connection from the connection pool and the success rate is higher. The edge of the proposed fix diminishes as the maximum number of connections in the pool is higher, probably because it hits another bottleneck (gevent, redis, network or others). Anyway, it is still significantly more efficient than the original redis-py’s BlockingConnectionPool without the fix.

Conclusion

redis-py’s BlockingConnectionPool with gevent does not produce good performance out of the box. The performance degrades under high contention of connection pool. Replacing the queue_class with gevent’s queue fixes the performance issue.

Tags// , , ,