After reading Bob Ippolito’s excellent Playing with PyPy I was inspired to try PyPy out myself. I heard a ton of buzz coming out of PyCon that PyPy is wicked fast and wicked awesome. I wanted to take a look, and Bob’s instructions were a perfectly made intro.
A lot of the work I do is with strings (as you can see in my picloud testing from last year). I built a little test of PyPy vs Python2.6 vs Python 2.6 + Pyrex + C-Extension to see how things were going. After following the instructions I have PyPy 1.4.1, and OSX 10.6.6’s built in Python 2.6. My test case is pretty simple – compute the DoubleMetaphone representations of 94,293 names from the Census. First gather the data:
curl -O http://www.census.gov/genealogy/names/dist.all.last; curl -O http://www.census.gov/genealogy/names/dist.female.first; curl -O http://www.census.gov/genealogy/names/dist.male.first;
So, now we setup our test code. All it does is loop through those 3 files we just downloaded of names, grabs the name from each line, computes the double metaphone values, and appends them to to a list.
I’m using two implementations of the DoubleMetaphone algorithm. First is Fuzzy, a library Jamie developed at Polimetrix that uses Pyrex to wrap the C implementation by Maurice Aubrey. The other version is Andrew Collin’s pure python one. For simplicity we’re going to call that atomodo.py after his domain.
pip install Fuzzy curl http://www.atomodo.com/code/double-metaphone/metaphone.py/at_download/file > atomodo.py
My test.py:
import sys if sys.argv[1] == 'atomodo': import atomodo dmeta = atomodo.dm elif sys.argv[1] == 'fuzzy': import fuzzy dmeta = fuzzy.DMetaphone() files = ['dist.all.last', 'dist.male.first', 'dist.female.first'] output = [] for file in files: fh = open(file) for row in fh: name = row[:15].strip() x = dmeta(name) output.append(x) |
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo
real 0m3.098s user 0m3.034s sys 0m0.055s (pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo # CPython real 0m2.425s user 0m2.390s sys 0m0.032s (pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy real 0m0.390s user 0m0.357s sys 0m0.032s
The results pretty well speak for themselves. C + Cython destroys the other two. Plain jane CPython is slightly faster than PyPy. Aside, but I ran all this with PYPY_GC_NURSERY=716K to help PyPy out. On my system that seemed like a sane default after running his script. I ran it with no PYPY_GC_NURSERY and the results were a bit slower across the board. In this case pypy was 3.180s without a GC_NURSERY value.
I decided to play around a little further at this point, to see if PyPy’s JIT would do better with more iterations. I tried two variations with different results for PyPy. In Variation A I loop the entire thing 10 times, inserting the loop above output = [], so the list is reset each time. In other words this is a loose loop, it opens the files 10 times, etc. The results are pretty interesting!
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo real 0m19.907s user 0m19.734s sys 0m0.145s (pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo real 0m24.615s user 0m24.450s sys 0m0.160s (pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy real 0m3.753s user 0m3.608s sys 0m0.143s
Variation B repeats just the double metaphone calculation 10 times, by wrapping x = dmeta(name). This does less work overall, because it doesn’t reopen the files, doesn’t have to iterate over them or substring + strip. PyPy does even better, comparatively.
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo real 0m16.610s user 0m16.511s sys 0m0.083s (pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo real 0m23.929s user 0m23.855s sys 0m0.067s (pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy real 0m2.526s user 0m2.484s sys 0m0.041s
So where does that leave us? Well if things scale perfectly the original times * 10 should be about the same as Variation A, and Variation B should be a tiny bit smaller (because it’s doing less work). However reality is always more confusing than we’d hope.
CPython running atomodo is quite consistent. The CPython+fuzzy is pretty darn fast and consistent too, seemingly getting more of an advantage from B than CPython+Atomodo. PyPy is crazy though. I would expect A and B to be faster than the original because JIT can work its magic more. However I was surprised by how much, and further surprised by how much B was faster than A. I guess the cache is very short lived or something?
Admittedly this test is flawed in 200 different ways. However it’s interesting to see where PyPy might be faster (very, very, very repetitive code; one pass calls dmeta(name) 94,293 times). I also know I’ll keep looking for C extensions.