5/Feb 2021
3 min. read
Background I have been experiencing headaches and fatigue since I moved into my new apartment. I suspect that they are caused by the preinstalled LED light bulbs.
Flickering Although high-frequency flickers are imperceptible to the human eye, prolonged exposure to a flickering light source may cause headaches and fatigue.
Physics There are multiple reasons for flickering LEDs, e.g. PWM dimming and poor AC/DC converters (e.g. LED drivers).
PWM dimming is a way to adjust brightness on LED backlit monitors and other dimmable LED bulbs.9/Feb 2020
1 min. read
Benefits of a serverless cron job What’s great about going serverless with a cron job is that there is no need to setup the environment, configure crontab, logging, error detection, etc. And you don’t have to worry about if the VM is down. It is also easier on your wallet.
Here I am using AWS Lambda for my serverless cron job.
Configuration and tips AWS Lambda function can be triggered periodically using CloudWatch events.28/Dec 2019
2 min. read
1. OpenVPN & MFA There are few tutorials about setting up OpenVPN and multi-factor authentication. For MFA, Google Authenticator is used here.
To setup OpenVPN server, I use OpenVPN road warrior installer. For MFA config, the script from egonbraun is forked and modified to work with road warrior setup: https://gist.github.com/carsonip/b02eecf9f7d036555a53fea6f516ced8
Also, the default OVPN server will renegotiate every 1 hour, but this will fail the MFA authentication, effectively disconnecting the OVPN connection periodically.23/Nov 2019
2 min. read
Background Profiling shows that pymongo’s bson.json_util.loads is consuming an unusual amount of CPU time.
Benchmark To confirm the function is bad, let’s do some benchmarking.
import pyperf def data(): import json return json.dumps(['asdfasdf%s' % i for i in xrange(20)]) s = data() runner = pyperf.Runner() runner.timeit(name="json", stmt="json.loads(s)", setup="from __main__ import s; import json;") runner.timeit(name="simplejson", stmt="simplejson.loads(s)", setup="from __main__ import s; import simplejson;") runner.timeit(name="bson json_util", stmt="json_util.loads(s)", setup="from __main__ import s; from bson import json_util;") Result:1/Jun 2019
3 min. read
In the last binary search article, we discussed how to write a correct binary search. But sometimes we are not asked to find any position of the target. If we need the first or the last occurrence, can we do it?
The classic binary search In Python, the basic binary search looks like this:
lo = 0 hi = N-1 while (lo < hi): mid = (lo+hi)//2 if A[mid] == target: return mid elif A[mid] < target: lo = mid + 1 else: hi = mid return lo if A[lo] == target else -1 And the invariant of the algorithm is that A[lo] <= target <= A[hi].16/Mar 2019
2 min. read
mysqlclient (MySQLdb fork) is a Python MySQL driver which uses libmysqlclient (or libmariadbclient). This means speed and stability. Does it work well with gevent? Not really.
In mysqlclient <= v1.3.13, there’s a sample called waiter_gevent.py which looks like this:
from __future__ import print_function """Demo using Gevent with mysqlclient.""" import gevent.hub import MySQLdb def gevent_waiter(fd, hub=gevent.hub.get_hub()): hub.wait(hub.loop.io(fd, 1)) def f(n): conn = MySQLdb.connect(user='root', waiter=gevent_waiter) cur = conn.cursor() cur.execute("SELECT SLEEP(%s)", (n,)) cur.execute("SELECT 1+%s", (n,)) print(cur.9/Mar 2019
3 min. read
Background I come across some Python code using boto, the Python interface to AWS. There’s boto3, which is a newer version of boto. If you are starting a new project, you should be looking at boto3 instead of boto.
I am told that boto pools connections. This is a good thing in terms of performance. Never waste HTTPS connections because it is expensive to setup. But does it limit the number of outgoing connections like SQLAlchemy’s QueuePool, or is it configurable in terms of number of pooled connections?29/Oct 2018
1 min. read
Another exercise from Programming Pearls: Heap. I finished the code a week ago but didn’t have the time for the blog post until now.
Here’s the code:18/Oct 2018
1 min. read
I am reading Programming Pearls and figure it would be fun to write Quicksort in Python.
Here are my code and benchmarks.
When I try to compare my code with others, I am surprised to see many faulty implementations of Quicksort with quadratic runtime on certain input.
The key to write a correct and fast Quicksort:
Watch out for list of equal items. qsort2 uses 2 pointers (i, j) to make sure the problem is divided into subproblems of similar size.8/Oct 2018
4 min. read
Update 16 Mar 2019 Updated instructions.
Update 16 Oct 2018 There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation.
Background My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?9/Aug 2018
1 min. read
There is this error when I use PyCharm debugger during pytest.
NoSuchColumnError: "Could not locate column in row for column 'mytable.id'" This is a SQLAlchemy error. No useful help on Google. No error if the breakpoints are muted. Therefore, it is possibly related to lazy loading of relationships when displaying variables.
This can be resolved by turning off the “load values asynchronously” option in PyCharm debugger. As far as I can remember, this is a new feature in PyCharm 2017.8/Jul 2018
3 min. read
Background My Python application cannot connect to MySQL. The error message looks like this:
Can’t connect to MySQL server on ‘127.0.0.1’ (4)
Error 4 means Interrupted system call.
I am using mysqlclient, the C wrapper MySQL connector. The error happens on both MySQL 5.6 and 5.7. It can be reproduced consistently. It seems that PyMySQL doesn’t have this problem. Also I am using gevent but it is not much related in this case.8/Jun 2018
4 min. read
Background Observation I notice a lot of "No connection available." logs from Redis Python client redis-py BlockingConnectionPool. Looking into the source of redis-py:
try: connection = self.pool.get(block=True, timeout=self.timeout) except Empty: # Note that this is not caught by the redis client and will be # raised unless handled by application code. If you want never to raise ConnectionError("No connection available.") It happens when there is no connection in the pool available.4/Jun 2018
3 min. read
Observation I have a HTTP request which will return after a specified time t. The request is made using Python requests from an Azure VM. When time t is smaller than 4 minutes, it works fine. Otherwise, the requests library raises ReadTimeout.
Explanation TCP connections from Azure has a “not-quite-well-documented” limit which will timeout after 4 minutes of idle activity. The related documentation can be found here under Azure Load Balancer, although it apparently affects Azure VMs with public IP (ILPIP / PIP) without load balancing.26/May 2018
3 min. read
Background I’ve fixed the html2text performance issue in last post, so now I can use it. I need to use it from Python, and that leaves me not many choices. Python by the C side, a blog post in the PayPal Engineering blog, has listed the options. C extension is hard to code and is not worth it. This post is about the experience and reflections about my first time using cffi.22/Dec 2017
2 min. read
Problem Under heavy load using gevent, I see this:
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/gevent/threadpool.py", line 207, in _worker value = func(*args, **kwargs) error: [Errno 11] Resource temporarily unavailable (<ThreadPool at 0x7fe468930dd0 0/5/10>, <built-in function getaddrinfo>) failed with error Solution There’s no good solution out there. Actually, it is easier to solve than expected. You only have to change the gevent’s DNS resolver.
In the doc, they didn’t clearly state the difference between the resolvers.22/Dec 2017
2 min. read
Background My server uses gevent.pywsgi. It works fine. However, every other few days the server will stop responding to requests. It says “Too many open files” in the logs.
Investigation A simple lsof is showing that there are many socket connections opened by pywsgi even when those sessions are completed. This FD (File Descriptor) leak probably causes the process to reach the ulimit -n per-process number of open files limit.