Azure TCP Idle Timeout, TCP keepalive, and Python

Observation

I have a HTTP request which will return after a specified time t. The request is made using Python requests from an Azure VM. When time t is smaller than 4 minutes, it works fine. Otherwise, the requests library raises ReadTimeout.

Explanation

TCP connections from Azure has a “not-quite-well-documented” limit which will timeout after 4 minutes of idle activity. The related documentation can be found here under Azure Load Balancer, although it apparently affects Azure VMs with public IP (ILPIP / PIP) without load balancing.

According to this paragraph, even though the public IP to VM is 1:1, there is a 1:1 NAT which will expire your idle tcp connection’s NAT entry.

Since the HTTP request will only return after time t, when the client is waiting for the response, the TCP connection will be idle, get timed out, and removed from NAT table. The client which initiated the request does not receive a RST packet when this happens. Instead, nothing can be read from the socket before the request times out, and therefore raising a read timeout.

Workaround

To keep the NAT entry from expiring, the TCP connection cannot be idle. The recommended workaround is to enable TCP keepalive. By sending the keepalive probe packets (an empty packet with ACK flag), it is enough to stop the entry from expiring.

The TCP keepalive has to kick in early enough to make this workaround effective. By default, TCP keepalive kicks in after net.ipv4.tcp_keepalive_time which has a default of 7200 seconds. Obviously, this is way too high for this 4-minute-timeout scenario. The related Linux kernel variables which should be tuned accordingly are net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_intvl and net.ipv4.tcp_keepalive_probes.

Here’s an example of reasonable values:

net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 8

Workaround part 2

Another bad news is that TCP keepalive is not enabled by default. It requires application-level support, i.e. it needs to be enabled in socket options.

In Python, it is possible to set socket options in urllib3 and from requests using a custom adapter. An example is shown in this answer on SO.

class HTTPAdapterWithSocketOptions(requests.adapters.HTTPAdapter):
    def __init__(self, *args, **kwargs):
        self.socket_options = kwargs.pop("socket_options", None)
        super(HTTPAdapterWithSocketOptions, self).__init__(*args, **kwargs)

    def init_poolmanager(self, *args, **kwargs):
        if self.socket_options is not None:
            kwargs["socket_options"] = self.socket_options
        super(HTTPAdapterWithSocketOptions, self).init_poolmanager(*args, **kwargs)

adapter = HTTPAdapterWithSocketOptions(socket_options=[(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)])
s = requests.session()
s.mount("http://", adapter)
s.mount("https://", adapter)

This is quite nice, but how about this?

from urllib3.connection import HTTPConnection
class HTTPAdapterWithTCPKeepalive(HTTPAdapterWithSocketOptions):
    def __init__(self, *args, **kwargs):
        self.socket_options = HTTPConnection.default_socket_options + [
                                (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
                            ]
        super(HTTPAdapterWithTCPKeepalive, self).__init__(*args, **kwargs)

As the urllib3 doc says:

socket_options: Set specific options on the underlying socket. If not specified, then defaults are loaded from HTTPConnection.default_socket_options which includes disabling Nagle’s algorithm (sets TCP_NODELAY to 1) unless the connection is behind a proxy.

Of course we don’t want to leave out any default settings, and that’s why I add them back.

Conclusion

The existence of a 1:1 NAT in front of a VM with 1:1 relationship to public IP is beyond my understanding. Enforcing idle TCP connection timeout at the NAT break things in many different ways. The workaround is not always feasible as it requies application-level support. A lot of people may have fallen victim to this “feature” without noticing as there is only a subtle read timeout error rather than a RST packet.

References

  1. Official Azure Load Balancer documentation
  2. Azure SNAT blog post (2015)
  3. A blog post by another victim