Debugging Memory Usage in Python 2.7 with tracemalloc
8/Oct 2018Update 16 Mar 2019
Updated instructions.
Update 16 Oct 2018
There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation.
Background
My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?
In the past, I used Pympler and Heapy. But what you get looks like this:
types | # objects | total size
======= | =========== | ============
dict | 1 | 280 B
list | 1 | 192 B
It does not tell you the location in code that uses the memory. You could get a diff between snapshots, but it doesn’t really help much unless you place the diff code close to the problem source. In a complex application with many concurrent threads / greenlets running, this is almost impossible.
tracemalloc to the rescue
tracemalloc is a built-in module since Python 3.4 and it shows the traceback of memory allocation. With tracemalloc, you could just take a snapshot, print the traceback, and you’ll get the problem source.
Output of a snapshot:
[ Top 10 ]
<frozen importlib._bootstrap>:716: size=4855 KiB, count=39328, average=126 B
<frozen importlib._bootstrap>:284: size=521 KiB, count=3199, average=167 B
/usr/lib/python3.4/collections/__init__.py:368: size=244 KiB, count=2315, average=108 B
/usr/lib/python3.4/unittest/case.py:381: size=185 KiB, count=779, average=243 B
/usr/lib/python3.4/unittest/case.py:402: size=154 KiB, count=378, average=416 B
/usr/lib/python3.4/abc.py:133: size=88.7 KiB, count=347, average=262 B
<frozen importlib._bootstrap>:1446: size=70.4 KiB, count=911, average=79 B
<frozen importlib._bootstrap>:1454: size=52.0 KiB, count=25, average=2131 B
<string>:5: size=49.7 KiB, count=148, average=344 B
/usr/lib/python3.4/sysconfig.py:411: size=48.0 KiB, count=1, average=48.0 KiB
Traceback of an entry in the snapshot:
903 memory blocks: 870.1 KiB
File "<frozen importlib._bootstrap>", line 716
File "<frozen importlib._bootstrap>", line 1036
File "<frozen importlib._bootstrap>", line 934
File "<frozen importlib._bootstrap>", line 1068
File "<frozen importlib._bootstrap>", line 619
File "<frozen importlib._bootstrap>", line 1581
File "<frozen importlib._bootstrap>", line 1614
File "/usr/lib/python3.4/doctest.py", line 101
import pdb
File "<frozen importlib._bootstrap>", line 284
File "<frozen importlib._bootstrap>", line 938
File "<frozen importlib._bootstrap>", line 1068
File "<frozen importlib._bootstrap>", line 619
File "<frozen importlib._bootstrap>", line 1581
File "<frozen importlib._bootstrap>", line 1614
File "/usr/lib/python3.4/test/support/__init__.py", line 1728
import doctest
File "/usr/lib/python3.4/test/test_pickletools.py", line 21
support.run_doctest(pickletools)
File "/usr/lib/python3.4/test/regrtest.py", line 1276
test_runner()
File "/usr/lib/python3.4/test/regrtest.py", line 976
display_failure=not verbose)
File "/usr/lib/python3.4/test/regrtest.py", line 761
match_tests=ns.match_tests)
File "/usr/lib/python3.4/test/regrtest.py", line 1563
main()
File "/usr/lib/python3.4/test/__main__.py", line 3
regrtest.main_in_temp_cwd()
File "/usr/lib/python3.4/runpy.py", line 73
exec(code, run_globals)
File "/usr/lib/python3.4/runpy.py", line 160
"__main__", fname, loader, pkg_name)
How about Python 2.7
It is not a stdlib module in Python 2.7. You’ll have to patch and compile Python 2.7 to use the 3rd party pytracemalloc module. It is easier than it sounds. I use Python 2.7.12 here, but you could use other Python versions.
sudo mkdir /opt/tracemalloc
sudo chown $USER: /opt/tracemalloc
cd /opt/tracemalloc
wget http://www.python.org/ftp/python/2.7.12/Python-2.7.12.tgz
git clone https://github.com/vstinner/pytracemalloc
tar -xf Python-2.7.12.tgz
cd Python-2.7.12
patch -p1 < ../pytracemalloc/patches/2.7.12/pep445.patch
./configure --enable-unicode=ucs4 --prefix=/opt/tracemalloc/py27
make install
Then we’ll create a virtualenv in the project directory. I use the system virtualenv here. Virtualenv installs pip for you so you don’t have to install pip manually.
cd ~/project
virtualenv -p /opt/tracemalloc/py27/bin/python venv
. venv/bin/activate
Don’t forget to install pytracemalloc
in the virtualenv.
cd /opt/tracemalloc/pytracemalloc
python setup.py install
The environment is now ready. To use tracemalloc, in your code:
import tracemalloc
tracemalloc.start(25) # 25 is the number of frames in traceback
# allocate some memory
x = range(10000)
snapshot = tracemalloc.take_snapshot()
# for line numbers
top_stats = snapshot.statistics('lineno')
print("[ Top 10 ]")
for stat in top_stats[:10]:
print(stat)
# for traceback
top_stats = snapshot.statistics('traceback')
# pick the biggest memory block
stat = top_stats[0]
print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
for line in stat.traceback.format():
print(line)
Summary
tracemalloc is a really nice tool to pinpoint the source of memory issues in a Python application. In Python 2.7, we can do it with pytracemalloc and a patched version of Python. Although patching and compiling Python sounds tedious, it may eventually save you time by skipping the guesswork required with conventional Python 2.7 memory profiling tools.