PyPy testing

22 Mar 2011 #cython #double metaphone #fuzzy #pypy #python #string

After reading Bob Ippolito’s excellent Playing with PyPy I was inspired to try PyPy out myself. I heard a ton of buzz coming out of PyCon that PyPy is wicked fast and wicked awesome. I wanted to take a look, and Bob’s instructions were a perfectly made intro.

A lot of the work I do is with strings (as you can see in my picloud testing from last year). I built a little test of PyPy vs Python2.6 vs Python 2.6 + Pyrex + C-Extension to see how things were going. After following the instructions I have PyPy 1.4.1, and OSX 10.6.6’s built in Python 2.6. My test case is pretty simple – compute the DoubleMetaphone representations of 94,293 names from the Census. First gather the data:

curl -O http://www.census.gov/genealogy/names/dist.all.last;
curl -O http://www.census.gov/genealogy/names/dist.female.first;
curl -O http://www.census.gov/genealogy/names/dist.male.first;

So, now we setup our test code. All it does is loop through those 3 files we just downloaded of names, grabs the name from each line, computes the double metaphone values, and appends them to to a list.

I’m using two implementations of the DoubleMetaphone algorithm. First is Fuzzy, a library Jamie developed at Polimetrix that uses Pyrex to wrap the C implementation by Maurice Aubrey. The other version is Andrew Collin’s pure python one. For simplicity we’re going to call that atomodo.py after his domain.

pip install Fuzzy
curl http://www.atomodo.com/code/double-metaphone/metaphone.py/at_download/file > atomodo.py

My test.py:

import sys
 
if sys.argv[1] == 'atomodo':
	import atomodo
	dmeta = atomodo.dm
elif sys.argv[1] == 'fuzzy':
	import fuzzy
	dmeta = fuzzy.DMetaphone()
 
files = ['dist.all.last', 'dist.male.first', 'dist.female.first']
output = []
for file in files:
	fh = open(file)
	for row in fh:
		name = row[:15].strip()
		x = dmeta(name)
		output.append(x)

(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo

real	0m3.098s
user	0m3.034s
sys	0m0.055s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo # CPython

real	0m2.425s
user	0m2.390s
sys	0m0.032s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy

real	0m0.390s
user	0m0.357s
sys	0m0.032s

The results pretty well speak for themselves. C + Cython destroys the other two. Plain jane CPython is slightly faster than PyPy. Aside, but I ran all this with PYPY_GC_NURSERY=716K to help PyPy out. On my system that seemed like a sane default after running his script. I ran it with no PYPY_GC_NURSERY and the results were a bit slower across the board. In this case pypy was 3.180s without a GC_NURSERY value.

I decided to play around a little further at this point, to see if PyPy’s JIT would do better with more iterations. I tried two variations with different results for PyPy. In Variation A I loop the entire thing 10 times, inserting the loop above output = [], so the list is reset each time. In other words this is a loose loop, it opens the files 10 times, etc. The results are pretty interesting!

(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo

real	0m19.907s
user	0m19.734s
sys	0m0.145s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo

real	0m24.615s
user	0m24.450s
sys	0m0.160s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy

real	0m3.753s
user	0m3.608s
sys	0m0.143s

Total User Seconds (smaller is better) Variation A

Variation B repeats just the double metaphone calculation 10 times, by wrapping x = dmeta(name). This does less work overall, because it doesn’t reopen the files, doesn’t have to iterate over them or substring + strip. PyPy does even better, comparatively.

(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo

real	0m16.610s
user	0m16.511s
sys	0m0.083s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo

real	0m23.929s
user	0m23.855s
sys	0m0.067s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy

real	0m2.526s
user	0m2.484s
sys	0m0.041s

Total User Seconds (smaller is better)+Variation+B

So where does that leave us? Well if things scale perfectly the original times * 10 should be about the same as Variation A, and Variation B should be a tiny bit smaller (because it’s doing less work). However reality is always more confusing than we’d hope.

Comparison: User Seconds (smaller is better)

CPython running atomodo is quite consistent. The CPython+fuzzy is pretty darn fast and consistent too, seemingly getting more of an advantage from B than CPython+Atomodo. PyPy is crazy though. I would expect A and B to be faster than the original because JIT can work its magic more. However I was surprised by how much, and further surprised by how much B was faster than A. I guess the cache is very short lived or something?

Admittedly this test is flawed in 200 different ways. However it’s interesting to see where PyPy might be faster (very, very, very repetitive code; one pass calls dmeta(name) 94,293 times). I also know I’ll keep looking for C extensions.