Fighting Flickering LED Bulbs with Python and OpenCV

Background I have been experiencing headaches and fatigue since I moved into my new apartment. I suspect that they are caused by the preinstalled LED light bulbs. Flickering Although high-frequency flickers are imperceptible to the human eye, prolonged exposure to a flickering light source may cause headaches and fatigue. Physics There are multiple reasons for flickering LEDs, e.g. PWM dimming and poor AC/DC converters (e.g. LED drivers). PWM dimming is a way to adjust brightness on LED backlit monitors and other dimmable LED bulbs.

Writing a Serverless Cron Job

Benefits of a serverless cron job What’s great about going serverless with a cron job is that there is no need to setup the environment, configure crontab, logging, error detection, etc. And you don’t have to worry about if the VM is down. It is also easier on your wallet. Here I am using AWS Lambda for my serverless cron job. Configuration and tips AWS Lambda function can be triggered periodically using CloudWatch events.

Random Notes to Conclude 2019

1. OpenVPN & MFA There are few tutorials about setting up OpenVPN and multi-factor authentication. For MFA, Google Authenticator is used here. To setup OpenVPN server, I use OpenVPN road warrior installer. For MFA config, the script from egonbraun is forked and modified to work with road warrior setup: https://gist.github.com/carsonip/b02eecf9f7d036555a53fea6f516ced8 Also, the default OVPN server will renegotiate every 1 hour, but this will fail the MFA authentication, effectively disconnecting the OVPN connection periodically.

Debugging Python Slow json.loads

Background Profiling shows that pymongo’s bson.json_util.loads is consuming an unusual amount of CPU time. Benchmark To confirm the function is bad, let’s do some benchmarking. import pyperf def data(): import json return json.dumps(['asdfasdf%s' % i for i in xrange(20)]) s = data() runner = pyperf.Runner() runner.timeit(name="json", stmt="json.loads(s)", setup="from __main__ import s; import json;") runner.timeit(name="simplejson", stmt="simplejson.loads(s)", setup="from __main__ import s; import simplejson;") runner.timeit(name="bson json_util", stmt="json_util.loads(s)", setup="from __main__ import s; from bson import json_util;") Result:

More on Binary Search: Variants

In the last binary search article, we discussed how to write a correct binary search. But sometimes we are not asked to find any position of the target. If we need the first or the last occurrence, can we do it? The classic binary search In Python, the basic binary search looks like this: lo = 0 hi = N-1 while (lo < hi): mid = (lo+hi)//2 if A[mid] == target: return mid elif A[mid] < target: lo = mid + 1 else: hi = mid return lo if A[lo] == target else -1 And the invariant of the algorithm is that A[lo] <= target <= A[hi].

Python mysqlclient Doesn't Work Well with gevent

mysqlclient (MySQLdb fork) is a Python MySQL driver which uses libmysqlclient (or libmariadbclient). This means speed and stability. Does it work well with gevent? Not really. In mysqlclient <= v1.3.13, there’s a sample called waiter_gevent.py which looks like this: from __future__ import print_function """Demo using Gevent with mysqlclient.""" import gevent.hub import MySQLdb def gevent_waiter(fd, hub=gevent.hub.get_hub()): hub.wait(hub.loop.io(fd, 1)) def f(n): conn = MySQLdb.connect(user='root', waiter=gevent_waiter) cur = conn.cursor() cur.execute("SELECT SLEEP(%s)", (n,)) cur.execute("SELECT 1+%s", (n,)) print(cur.

Fixing boto (AWS Python interface) Quadratic Runtime Connection Pool

Background I come across some Python code using boto, the Python interface to AWS. There’s boto3, which is a newer version of boto. If you are starting a new project, you should be looking at boto3 instead of boto. I am told that boto pools connections. This is a good thing in terms of performance. Never waste HTTPS connections because it is expensive to setup. But does it limit the number of outgoing connections like SQLAlchemy’s QueuePool, or is it configurable in terms of number of pooled connections?

Heap and Heapsort in Python

Another exercise from Programming Pearls: Heap. I finished the code a week ago but didn’t have the time for the blog post until now. Here’s the code:

Quicksort in Python

I am reading Programming Pearls and figure it would be fun to write Quicksort in Python. Here are my code and benchmarks. When I try to compare my code with others, I am surprised to see many faulty implementations of Quicksort with quadratic runtime on certain input. The key to write a correct and fast Quicksort: Watch out for list of equal items. qsort2 uses 2 pointers (i, j) to make sure the problem is divided into subproblems of similar size.

Debugging Memory Usage in Python 2.7 with tracemalloc

Update 16 Mar 2019 Updated instructions. Update 16 Oct 2018 There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation. Background My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?

Python, SQLAlchemy, Pytest, PyCharm Debugger & NoSuchColumnError

There is this error when I use PyCharm debugger during pytest. NoSuchColumnError: "Could not locate column in row for column 'mytable.id'" This is a SQLAlchemy error. No useful help on Google. No error if the breakpoints are muted. Therefore, it is possibly related to lazy loading of relationships when displaying variables. This can be resolved by turning off the “load values asynchronously” option in PyCharm debugger. As far as I can remember, this is a new feature in PyCharm 2017.

Python & MySQL Interrupted System Call & pyinstrument

Background My Python application cannot connect to MySQL. The error message looks like this: Can’t connect to MySQL server on ‘127.0.0.1’ (4) Error 4 means Interrupted system call. I am using mysqlclient, the C wrapper MySQL connector. The error happens on both MySQL 5.6 and 5.7. It can be reproduced consistently. It seems that PyMySQL doesn’t have this problem. Also I am using gevent but it is not much related in this case.

10x Faster Python gevent Redis Connection Pool

Background Observation I notice a lot of "No connection available." logs from Redis Python client redis-py BlockingConnectionPool. Looking into the source of redis-py: try: connection = self.pool.get(block=True, timeout=self.timeout) except Empty: # Note that this is not caught by the redis client and will be # raised unless handled by application code. If you want never to raise ConnectionError("No connection available.") It happens when there is no connection in the pool available.

Azure TCP Idle Timeout, TCP keepalive, and Python

Observation I have a HTTP request which will return after a specified time t. The request is made using Python requests from an Azure VM. When time t is smaller than 4 minutes, it works fine. Otherwise, the requests library raises ReadTimeout. Explanation TCP connections from Azure has a “not-quite-well-documented” limit which will timeout after 4 minutes of idle activity. The related documentation can be found here under Azure Load Balancer, although it apparently affects Azure VMs with public IP (ILPIP / PIP) without load balancing.

Writing a Python Wrapper for html2text using cffi

Background I’ve fixed the html2text performance issue in last post, so now I can use it. I need to use it from Python, and that leaves me not many choices. Python by the C side, a blog post in the PayPal Engineering blog, has listed the options. C extension is hard to code and is not worth it. This post is about the experience and reflections about my first time using cffi.

gevent built-in function getaddrinfo failed with error

Problem Under heavy load using gevent, I see this: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/gevent/threadpool.py", line 207, in _worker value = func(*args, **kwargs) error: [Errno 11] Resource temporarily unavailable (<ThreadPool at 0x7fe468930dd0 0/5/10>, <built-in function getaddrinfo>) failed with error Solution There’s no good solution out there. Actually, it is easier to solve than expected. You only have to change the gevent’s DNS resolver. In the doc, they didn’t clearly state the difference between the resolvers.

gevent.pywsgi's File Descriptor Leak and HTTP Keep-Alive

Background My server uses gevent.pywsgi. It works fine. However, every other few days the server will stop responding to requests. It says “Too many open files” in the logs. Investigation A simple lsof is showing that there are many socket connections opened by pywsgi even when those sessions are completed. This FD (File Descriptor) leak probably causes the process to reach the ulimit -n per-process number of open files limit.