Hardware Acceleration on Linux Mint, Cinnamon, and Firefox

Background I can’t believe how much time I spent on debugging this. This story begins with a new AMD graphics card. After installing the AMD RX6000 series graphics card and the amdgpu driver, and setting all the required Firefox configs, I open a youtube video at 4k 60fps on Firefox and expect to enjoy av1 hardware accelerated decoding. Disappointingly, all I see is dropped frames. To understand what’s wrong, I head over to about:support and observe some FEATURE_FAILURE_SOFTWARE_GL and FEATURE_FAILURE_VIDEO_DECODING_TEST_FAILED.

Python mysqlclient Doesn't Work Well with gevent

mysqlclient (MySQLdb fork) is a Python MySQL driver which uses libmysqlclient (or libmariadbclient). This means speed and stability. Does it work well with gevent? Not really. In mysqlclient <= v1.3.13, there’s a sample called waiter_gevent.py which looks like this: from __future__ import print_function """Demo using Gevent with mysqlclient.""" import gevent.hub import MySQLdb def gevent_waiter(fd, hub=gevent.hub.get_hub()): hub.wait(hub.loop.io(fd, 1)) def f(n): conn = MySQLdb.connect(user='root', waiter=gevent_waiter) cur = conn.cursor() cur.execute("SELECT SLEEP(%s)", (n,)) cur.execute("SELECT 1+%s", (n,)) print(cur.

Scaling ProxySQL with --idle-threads option and epoll

Update 21 Mar 2019 There’s a new post Fixing ProxySQL Idle Threads Epoll Hang Heisenbug about fixing a bug of ProxySQL when --idle-threads is enabled. Problem ProxySQL does not scale with high numbers of connections by default. When there are tens of thousands of client connection to ProxySQL, the CPU usage of ProxySQL may go up to 300%+ because by default it uses 4 threads. Even with 3000 mostly idle connections, it hits ~70% CPU usage.

Use Let's Encrypt with Certbot and nginx inside Docker

Update 8 Jun 2019: Change crontab certbot renew command to use --deploy-hook instead of --renew-hook. Using certbot to install and auto-renew Let’s Encrypt SSL certs with nginx installed in system is almost fool-proof. How about nginx inside docker? Not so easy. Assume we use the official nginx docker imageand start the docker container with name my_nginx. docker run -d -p 80:80 -p 443:443 -v /var/www:/var/www -v /etc/letsencrypt:/etc/letsencrypt --name my_nginx nginx Assuming the domain name is www.

Load Balancing Outgoing Traffic with HAProxy

To work around some restrictions, I need to load balance outgoing traffic such that they appear to be coming from different IPs. Here is how you do it with HAProxy: backend servers server srv1 target.example.com:8080 check source usesrc server srv2 target.example.com:8080 check source usesrc server srv3 target.example.com:8080 check source usesrc Use mode tcp and this works for SSL connections too. The IPs like in the example are private IPs, but they will be translated to different public IPs thanks to Azure’s SNAT.

Notes on Importing mysqldump

Background: I have multiple 200GB+ MySQL dumps to import. It takes a lot of time. pv shows you the progress so you have the idea how long will the import take. Pipe the dump.sql.gz file to gunzip instead of unzipping it then import. Saves much disk read and write, not to mention the time and effort. e.g. pv dump.sql.gz | gunzip -c | mysql grep is slow for a large file.

Random Notes on Scaling

parallel-ssh has a -I option to feed in a shell script file such that quotes escaping is not needed. On Ubuntu 14.04, the HAProxy may have 2 pids after sudo service haproxy restart. I suspect one of them is a zombie since the new open file limit does not apply. The safer way is to killall haproxy then sudo service haproxy start. HAProxy has a maxconn globally and maxconn for each frontend.

Debugging Memory Usage in Python 2.7 with tracemalloc

Update 16 Mar 2019 Updated instructions. Update 16 Oct 2018 There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation. Background My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?

Python, SQLAlchemy, Pytest, PyCharm Debugger & NoSuchColumnError

There is this error when I use PyCharm debugger during pytest. NoSuchColumnError: "Could not locate column in row for column 'mytable.id'" This is a SQLAlchemy error. No useful help on Google. No error if the breakpoints are muted. Therefore, it is possibly related to lazy loading of relationships when displaying variables. This can be resolved by turning off the “load values asynchronously” option in PyCharm debugger. As far as I can remember, this is a new feature in PyCharm 2017.

Python & MySQL Interrupted System Call & pyinstrument

Background My Python application cannot connect to MySQL. The error message looks like this: Can’t connect to MySQL server on ‘’ (4) Error 4 means Interrupted system call. I am using mysqlclient, the C wrapper MySQL connector. The error happens on both MySQL 5.6 and 5.7. It can be reproduced consistently. It seems that PyMySQL doesn’t have this problem. Also I am using gevent but it is not much related in this case.

Azure TCP Idle Timeout, TCP keepalive, and Python

Observation I have a HTTP request which will return after a specified time t. The request is made using Python requests from an Azure VM. When time t is smaller than 4 minutes, it works fine. Otherwise, the requests library raises ReadTimeout. Explanation TCP connections from Azure has a “not-quite-well-documented” limit which will timeout after 4 minutes of idle activity. The related documentation can be found here under Azure Load Balancer, although it apparently affects Azure VMs with public IP (ILPIP / PIP) without load balancing.

Tutorial: Installing Linux alongside an existing Windows Installation (Dual Boot)

Background A friend of mine requests a tutorial of installing Linux alongside an existing Windows installation so here you go. It is not hard but can be tricky at times. No one wants to spend a day troubleshooting the dual boot setup. The Big Picture Assume that you have a disk (HDD or SSD, doesn’t matter) 100% allocated to a partition (NTFS or whatever) with Windows installed. We need to prepare the installation USB drive, shrink the partition, then boot from the USB drive and install your favorite Linux distro, in this case, Linux Mint or Ubuntu.

Fixing html2text Quadratic Runtime

Background I come across this command line utility available in linux called html2text which was first written in 1999 and changed hands later. Obviously an old project, but a solid one. At least it handles a<div><br></div>b properly by outputing a\n\nb, instead of a\n\n\nb like most of the other converters out there. (I’m looking at you, Python html2text.) I download the source code of v1.3.2a from here and play around it.

gevent built-in function getaddrinfo failed with error

Problem Under heavy load using gevent, I see this: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/gevent/threadpool.py", line 207, in _worker value = func(*args, **kwargs) error: [Errno 11] Resource temporarily unavailable (<ThreadPool at 0x7fe468930dd0 0/5/10>, <built-in function getaddrinfo>) failed with error Solution There’s no good solution out there. Actually, it is easier to solve than expected. You only have to change the gevent’s DNS resolver. In the doc, they didn’t clearly state the difference between the resolvers.

gevent.pywsgi's File Descriptor Leak and HTTP Keep-Alive

Background My server uses gevent.pywsgi. It works fine. However, every other few days the server will stop responding to requests. It says “Too many open files” in the logs. Investigation A simple lsof is showing that there are many socket connections opened by pywsgi even when those sessions are completed. This FD (File Descriptor) leak probably causes the process to reach the ulimit -n per-process number of open files limit.

Debugging rsync and ssh

Background I fall in love with rsync lately. It is particularly useful when I sync my hadoop stuff (scripts and input, which add up to a few GBs) between local and my hadoop cluster. After running the sync script for a few times, I cannot ssh to the machine anymore. This post is about how I debug it and the lessons learned. This is how the problematic script sync.sh looks like: (the IP is masked for obvious reasons) (warning: this script is faulty, do not use)