Debugging Memory Usage in Python 2.7 with tracemalloc

Update 16 Oct 2018

There was a bug in pytracemalloc that prevents the PYTHONTRACEMALLOC environment variable from working. I have submitted a pull request and it is now merged. The PR fixed the bug in Python patches, added a testing script for the patches and improved the documentation.

Background

My application is killed by the OOM killer. How do I find out why the application is taking up so much memory?

In the past, I used Pympler and Heapy. But what you get looks like this:

  types |   # objects |   total size
======= | =========== | ============
   dict |           1 |    280     B
   list |           1 |    192     B

It does not tell you the location in code that uses the memory. You could get a diff between snapshots, but it doesn’t really help much unless you place the diff code close to the problem source. In a complex application with many concurrent threads / greenlets running, this is almost impossible.

tracemalloc to the rescue

tracemalloc is a built-in module since Python 3.4 and it shows the traceback of memory allocation. With tracemalloc, you could just take a snapshot, print the traceback, and you’ll get the problem source.

Output of a snapshot:

[ Top 10 ]
<frozen importlib._bootstrap>:716: size=4855 KiB, count=39328, average=126 B
<frozen importlib._bootstrap>:284: size=521 KiB, count=3199, average=167 B
/usr/lib/python3.4/collections/__init__.py:368: size=244 KiB, count=2315, average=108 B
/usr/lib/python3.4/unittest/case.py:381: size=185 KiB, count=779, average=243 B
/usr/lib/python3.4/unittest/case.py:402: size=154 KiB, count=378, average=416 B
/usr/lib/python3.4/abc.py:133: size=88.7 KiB, count=347, average=262 B
<frozen importlib._bootstrap>:1446: size=70.4 KiB, count=911, average=79 B
<frozen importlib._bootstrap>:1454: size=52.0 KiB, count=25, average=2131 B
<string>:5: size=49.7 KiB, count=148, average=344 B
/usr/lib/python3.4/sysconfig.py:411: size=48.0 KiB, count=1, average=48.0 KiB

Traceback of an entry in the snapshot:

903 memory blocks: 870.1 KiB
  File "<frozen importlib._bootstrap>", line 716
  File "<frozen importlib._bootstrap>", line 1036
  File "<frozen importlib._bootstrap>", line 934
  File "<frozen importlib._bootstrap>", line 1068
  File "<frozen importlib._bootstrap>", line 619
  File "<frozen importlib._bootstrap>", line 1581
  File "<frozen importlib._bootstrap>", line 1614
  File "/usr/lib/python3.4/doctest.py", line 101
    import pdb
  File "<frozen importlib._bootstrap>", line 284
  File "<frozen importlib._bootstrap>", line 938
  File "<frozen importlib._bootstrap>", line 1068
  File "<frozen importlib._bootstrap>", line 619
  File "<frozen importlib._bootstrap>", line 1581
  File "<frozen importlib._bootstrap>", line 1614
  File "/usr/lib/python3.4/test/support/__init__.py", line 1728
    import doctest
  File "/usr/lib/python3.4/test/test_pickletools.py", line 21
    support.run_doctest(pickletools)
  File "/usr/lib/python3.4/test/regrtest.py", line 1276
    test_runner()
  File "/usr/lib/python3.4/test/regrtest.py", line 976
    display_failure=not verbose)
  File "/usr/lib/python3.4/test/regrtest.py", line 761
    match_tests=ns.match_tests)
  File "/usr/lib/python3.4/test/regrtest.py", line 1563
    main()
  File "/usr/lib/python3.4/test/__main__.py", line 3
    regrtest.main_in_temp_cwd()
  File "/usr/lib/python3.4/runpy.py", line 73
    exec(code, run_globals)
  File "/usr/lib/python3.4/runpy.py", line 160
    "__main__", fname, loader, pkg_name)

How about Python 2.7

It is not a stdlib module in Python 2.7. You’ll have to patch and compile Python 2.7 to use the 3rd party pytracemalloc module. It is easier than it sounds. I use Python 2.7.8 here, but you could simply change it to 2.7.9 or something else.

sudo mkdir /opt/tracemalloc
sudo chown $USER: /opt/tracemalloc
cd /opt/tracemalloc
wget http://www.python.org/ftp/python/2.7.8/Python-2.7.8.tgz
wget https://pypi.python.org/packages/source/p/pytracemalloc/pytracemalloc-1.2.tar.gz
tar -xf Python-2.7.8.tgz
tar -xf pytracemalloc-1.2.tar.gz
cd Python-2.7.8
patch -p1 < ../pytracemalloc-1.2/patches/2.7/pep445.patch
./configure --enable-unicode=ucs4 --prefix=/opt/tracemalloc/py27
make install

There you have a patched Python compiled. But pip is not installed with Python, so you’ll have to get it like this.

mkdir /opt/tracemalloc/pip
cd /opt/tracemalloc/pip
wget https://bootstrap.pypa.io/get-pip.py
/opt/tracemalloc/py27/bin/python2.7 get-pip.py

Then we’ll create a virtualenv in the project directory. (I use the system virtualenv here, but you can install a new virtualenv with the new pip if you like.)

cd ~/project
virtualenv -p /opt/tracemalloc/py27/bin/python venv
. venv/bin/activate

Don’t forget to install pytracemalloc in the virtualenv.

cd /opt/tracemalloc/pytracemalloc-1.2
python setup.py install

Now in your code:

import tracemalloc
tracemalloc.start(25)  # 25 is the number of frames in traceback

# allocate some memory
x = range(10000)

snapshot = tracemalloc.take_snapshot()

# for line numbers
top_stats = snapshot.statistics('lineno')

print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)

# for traceback
top_stats = snapshot.statistics('traceback')

# pick the biggest memory block
stat = top_stats[0]
print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
for line in stat.traceback.format():
    print(line)

Summary

tracemalloc is a really nice tool to pinpoint the source of memory issues in a Python application. In Python 2.7, we can do it with pytracemalloc and a patched version of Python. Although patching and compiling Python sounds tedious, it may eventually save you time by skipping the guesswork required with conventional Python 2.7 memory profiling tools.