Heap and Heapsort in Python

Another exercise from Programming Pearls: Heap. I finished the code a week ago but didn’t have the time for the blog post until now. Here’s the code:

Quicksort in Python

I am reading Programming Pearls and figure it would be fun to write Quicksort in Python. Here are my code and benchmarks. When I try to compare my code with others, I am surprised to see many faulty implementations of Quicksort with quadratic runtime on certain input. The key to write a correct and fast Quicksort: Watch out for list of equal items. qsort2 uses 2 pointers (i, j) to make sure the problem is divided into subproblems of similar size.

Debugging Memory Usage in Python 2.7 with tracemalloc

Update 16 Oct 2018 There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation. Background My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?

Python, SQLAlchemy, Pytest, PyCharm Debugger & NoSuchColumnError

There is this error when I use PyCharm debugger during pytest. NoSuchColumnError: "Could not locate column in row for column 'mytable.id'" This is a SQLAlchemy error. No useful help on Google. No error if the breakpoints are muted. Therefore, it is possibly related to lazy loading of relationships when displaying variables. This can be resolved by turning off the “load values asynchronously” option in PyCharm debugger. As far as I can remember, this is a new feature in PyCharm 2017.

Python & MySQL Interrupted System Call & pyinstrument

Background My Python application cannot connect to MySQL. The error message looks like this: Can’t connect to MySQL server on ‘127.0.0.1’ (4) Error 4 means Interrupted system call. I am using mysqlclient, the C wrapper MySQL connector. The error happens on both MySQL 5.6 and 5.7. It can be reproduced consistently. It seems that PyMySQL doesn’t have this problem. Also I am using gevent but it is not much related in this case.

10x Faster Python gevent Redis Connection Pool

Background Observation I notice a lot of "No connection available." logs from Redis Python client redis-py BlockingConnectionPool. Looking into the source of redis-py: try: connection = self.pool.get(block=True, timeout=self.timeout) except Empty: # Note that this is not caught by the redis client and will be # raised unless handled by application code. If you want never to raise ConnectionError("No connection available.") It happens when there is no connection in the pool available.

Azure TCP Idle Timeout, TCP keepalive, and Python

Observation I have a HTTP request which will return after a specified time t. The request is made using Python requests from an Azure VM. When time t is smaller than 4 minutes, it works fine. Otherwise, the requests library raises ReadTimeout. Explanation TCP connections from Azure has a “not-quite-well-documented” limit which will timeout after 4 minutes of idle activity. The related documentation can be found here under Azure Load Balancer, although it apparently affects Azure VMs with public IP (ILPIP / PIP) without load balancing.

Writing a Python Wrapper for html2text using cffi

Background I’ve fixed the html2text performance issue in last post, so now I can use it. I need to use it from Python, and that leaves me not many choices. Python by the C side, a blog post in the PayPal Engineering blog, has listed the options. C extension is hard to code and is not worth it. This post is about the experience and reflections about my first time using cffi.

gevent built-in function getaddrinfo failed with error

Problem Under heavy load using gevent, I see this: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/gevent/threadpool.py", line 207, in _worker value = func(*args, **kwargs) error: [Errno 11] Resource temporarily unavailable (<ThreadPool at 0x7fe468930dd0 0/5/10>, <built-in function getaddrinfo>) failed with error Solution There’s no good solution out there. Actually, it is easier to solve than expected. You only have to change the gevent’s DNS resolver. In the doc, they didn’t clearly state the difference between the resolvers.

gevent.pywsgi's File Descriptor Leak and HTTP Keep-Alive

Background My server uses gevent.pywsgi. It works fine. However, every other few days the server will stop responding to requests. It says “Too many open files” in the logs. Investigation A simple lsof is showing that there are many socket connections opened by pywsgi even when those sessions are completed. This FD (File Descriptor) leak probably causes the process to reach the ulimit -n per-process number of open files limit.