Python Performance Profiling: The Guts And The Glory

Short Form


Your Python program is too slow, and you need to optimize it. Where do you start? With the right tools, you can optimize your code where it counts. We’ll explore the guts of the Python profiler “Yappi” to understand its features and limitations. We’ll learn how to find the maximum performance wins with minimum effort.


My boss alerted me to an article on a popular site, which claimed to show that my open-source Python client for MongoDB is three times slower than the Javascript client. Anxiety set in: Was this true? Could I improve it? What should I tell my boss?

A typical program spends almost all its time in a small subset of its code. Optimizing those hotspots is all that matters. This is what a profiler is for: it leads us straight to the functions where we should spend our effort. So I decided to profile the code in the article to see why it was slow.

I’ll describe three open-source profilers for Python: cProfile is a fast single-thread profiler included in the Python standard library. GreenletProfiler is my package for profiling Gevent applications. Yappi is a third-party package that can profile multiple threads. I used Yappi for this investigation, since it’s the most featureful.

Yappi has configuration options for how it measures time and which functions it profiles. I’ll show you how I configured and ran Yappi, and how I visualized its output in KCacheGrind. I narrowed the search for hotspots, and calculated upper bounds for what performance gains I could achieve.

Optimization is like debugging: we form a hypothesis for what changes will yield the best speedups, than perform experiments. This forms a virtuous cycle of benchmarking and improving our code. I’ll relate the shocking conclusion to my investigation of the slow code.

If you’re like me, you can’t sleep if you don’t understand how something works. We’ll explore how Yappi hooks into the Python interpreter’s guts. Yappi employs a clever trick to efficiently profile all running threads.


performance, python

Speaking experience

I presented “What Is Async, How Does It Work, and When Should I Use It?” at Open Source Bridge 2013 and PyCon 2014.

“Night of the Living Thread”, a talk about a Python standard library bug, Austin Python Meetup November 2013.

I spoke on Python coroutines at the NY Python Meetup, April 2013.

Talks on MongoDB topics including schema design, replication, and the aggregation framework. MongoDB Atlanta in April 2012, MongoDB Chicago in November 2012, and WindyCityDB in January 2013.

Talks on Motor, my async driver for MongoDB and Python. MongoDB Chicago in November 2012, Python and MongoDB Meetups in Seattle and Philadelphia, and at PyGotham in June 2012.

MongoDB training sessions, with between 6 and 20 students, lasting 2 or 3 full days, given 4 times during 2012 and 2013.