Use Let's Encrypt with Certbot and nginx inside Docker

Using certbot to install and auto-renew Let’s Encrypt SSL certs with nginx installed in system is almost fool-proof. How about nginx inside docker? Not so easy. Assume we use the official nginx docker imageand start the docker container with name my_nginx. docker run -d -p 80:80 -p 443:443 -v /var/www:/var/www -v /etc/letsencrypt:/etc/letsencrypt --name my_nginx nginx Assuming the domain name is nginx config: http { server { listen 443 ssl; server_name www.

Load Balancing Outgoing Traffic with HAProxy

To work around some restrictions, I need to load balance outgoing traffic such that they appear to be coming from different IPs. Here is how you do it with HAProxy: backend servers server srv1 check source usesrc server srv2 check source usesrc server srv3 check source usesrc Use mode tcp and this works for SSL connections too. The IPs like 10.

Notes on Importing mysqldump

Background: I have multiple 200GB+ MySQL dumps to import. It takes a lot of time. pv shows you the progress so you have the idea how long will the import take. Pipe the dump.sql.gz file to gunzip instead of unzipping it then import. Saves much disk read and write, not to mention the time and effort. e.g. pv dump.sql.gz | gunzip -c | mysql grep is slow for a large file.

Random Notes on Scaling

parallel-ssh has a -I option to feed in a shell script file such that quotes escaping is not needed. On Ubuntu 14.04, the HAProxy may have 2 pids after sudo service haproxy restart. I suspect one of them is a zombie since the new open file limit does not apply. The safer way is to killall haproxy then sudo service haproxy start. HAProxy has a maxconn globally and maxconn for each frontend.

Debugging Memory Usage in Python 2.7 with tracemalloc

Update 16 Oct 2018 There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation. Background My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?

Python, SQLAlchemy, Pytest, PyCharm Debugger & NoSuchColumnError

There is this error when I use PyCharm debugger during pytest. NoSuchColumnError: "Could not locate column in row for column ''" This is a SQLAlchemy error. No useful help on Google. No error if the breakpoints are muted. Therefore, it is possibly related to lazy loading of relationships when displaying variables. This can be resolved by turning off the “load values asynchronously” option in PyCharm debugger. As far as I can remember, this is a new feature in PyCharm 2017.

Python & MySQL Interrupted System Call & pyinstrument

Background My Python application cannot connect to MySQL. The error message looks like this: Can’t connect to MySQL server on ‘’ (4) Error 4 means Interrupted system call. I am using mysqlclient, the C wrapper MySQL connector. The error happens on both MySQL 5.6 and 5.7. It can be reproduced consistently. It seems that PyMySQL doesn’t have this problem. Also I am using gevent but it is not much related in this case.

Azure TCP Idle Timeout, TCP keepalive, and Python

Observation I have a HTTP request which will return after a specified time t. The request is made using Python requests from an Azure VM. When time t is smaller than 4 minutes, it works fine. Otherwise, the requests library raises ReadTimeout. Explanation TCP connections from Azure has a “not-quite-well-documented” limit which will timeout after 4 minutes of idle activity. The related documentation can be found here under Azure Load Balancer, although it apparently affects Azure VMs with public IP (ILPIP / PIP) without load balancing.

Tutorial: Installing Linux alongside an existing Windows Installation (Dual Boot)

Background A friend of mine requests a tutorial of installing Linux alongside an existing Windows installation so here you go. It is not hard but can be tricky at times. No one wants to spend a day troubleshooting the dual boot setup. The Big Picture Assume that you have a disk (HDD or SSD, doesn’t matter) 100% allocated to a partition (NTFS or whatever) with Windows installed. We need to prepare the installation USB drive, shrink the partition, then boot from the USB drive and install your favorite Linux distro, in this case, Linux Mint or Ubuntu.

Fixing html2text Quadratic Runtime

Background I come across this command line utility available in linux called html2text which was first written in 1999 and changed hands later. Obviously an old project, but a solid one. At least it handles a<div><br></div>b properly by outputing a\n\nb, instead of a\n\n\nb like most of the other converters out there. (I’m looking at you, Python html2text.) I download the source code of v1.3.2a from here and play around it.

gevent built-in function getaddrinfo failed with error

Problem Under heavy load using gevent, I see this: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/gevent/", line 207, in _worker value = func(*args, **kwargs) error: [Errno 11] Resource temporarily unavailable (<ThreadPool at 0x7fe468930dd0 0/5/10>, <built-in function getaddrinfo>) failed with error Solution There’s no good solution out there. Actually, it is easier to solve than expected. You only have to change the gevent’s DNS resolver. In the doc, they didn’t clearly state the difference between the resolvers.

gevent.pywsgi's File Descriptor Leak and HTTP Keep-Alive

Background My server uses gevent.pywsgi. It works fine. However, every other few days the server will stop responding to requests. It says “Too many open files” in the logs. Investigation A simple lsof is showing that there are many socket connections opened by pywsgi even when those sessions are completed. This FD (File Descriptor) leak probably causes the process to reach the ulimit -n per-process number of open files limit.

Debugging rsync and ssh

Background I fall in love with rsync lately. It is particularly useful when I sync my hadoop stuff (scripts and input, which add up to a few GBs) between local and my hadoop cluster. After running the sync script for a few times, I cannot ssh to the machine anymore. This post is about how I debug it and the lessons learned. This is how the problematic script looks like: (the IP is masked for obvious reasons) (warning: this script is faulty, do not use)