Apple Dividend

Posted: March 25th, 2012 | Author: | Filed under: Nerdery | Tags: , | No Comments »
Dear.
I am T Cook,Director of the Apple inc, california. A man by the name of
Steven Jobs amassed a deposit of Ninety seven billion six Hundred million
United State Dollar $97,600,000,00 and he died leaving behind no next of
kin, am ready to share 60/40 with you if you choose to stand as my deceased
client next of kin.if you are interested please notify to send to me this
information via this mail tcook@hotmail.com

1. YOUR NAME:
2. YOUR RESIDENT ADDRESS:
3. YOUR OCCUPATION:
4. YOUR PHONE NUMBER:
5. DATE OF BIRTH:
6. COUNTRY OF RESIDENT:

Mind you your names and address will be used by my Attorney to prepare the
needed documents that will back you up as the beneficiary of my deceased
client funds.I wait to hear from you anyway,I have
spoken my honest mind to you this day.

Best Regards,
Mr.Tim Cook
No Comments »

This is Your Brain on Facebook

Posted: January 18th, 2012 | Author: | Filed under: Nerdery, Philosophising, School | Tags: , , | No Comments »

I wrote this as an Op-Ed for the last progression of University Writing. Posted here to share.

A recent study found that 87% of US undergraduates are on Facebook for an average of 93 minutes daily. At 11 hours a week that’s nearly as long as many of us spend in class. If 12 hours of classroom time is supposed to not just teach us facts but also train us to become more complex thinkers why do we pretend that 11 hours of Facebook won’t have an affect, too? Facebook asks us to constantly sift through posts, skim, evaluate, and make microscopic comments. By using Facebook we are training our minds to condense all issues into easily “like”-able one-liners, rather than complex essays.

During finals last month many of us turned to Facebook to relieve stress. Many students, including myself, found that Facebook became not a limited relief valve but a means of procrastination. By the night before an exam we thought our only recourse was to block Facebook. Then, finally, our true academic selves would shine in blissful focus and productivity.

If only it was that easy. The distraction and inability to focus that led us to block Facebook wasn’t because we were using Facebook that night. It was the result of our brain adapting to excel at the Facebook friendly tasks we demanded of it, at the expense of less frequent tasks, such as deep reading. After so many hours on Facebook over so many months the Facebook way of shallow thinking was dominant. The focused contemplative mindset became a difficult to achieve anomaly. The night before a paper was due was simply too late to change anything. Even though we were offline we carried Facebook’s in our cognition.

Any new intellectual technology, including Facebook, encourages certain ways of thinking and discourages others. The invention of writing allowed humanity to easily store and retrieve information, a laborious process in oral cultures, and in turn led to an explosion of knowledge. However Socrates, in Plato’s Phaedrus, warns of the cognitive downside to writing by retelling the legend of king Thalmus, who, upon receiving the gift of writing from the god Theuth immediately questions the tradeoff it requires. Readers, Socrates says, will “be thought very knowledgeable when they are for the most part quite ignorant” because there was no oral instruction, and therefore, in his opinion, no deep learning. Writing fundamentally changed how we think. Today Facebook is changing it again. It encourages us to think in terms of connections, which may be advantageous in an increasingly interdisciplinary world, but it discourages deep reading, analysis, and debate. We must acknowledge these changes, and then adapt to them.

Facebook has many advantages, such as staying in touch with friends & family. However research, and common sense, suggest that large amounts of unfocused Facebook browsing damages our ability to concentrate, to understand complex ideas, and to develop our own ideas. Must we choose to either concede our thought patterns to Mark Zuckerberg, or abandon Facebook entirely? Neither is a great choice.

Instead of simply embracing or abandoning Facebook take the opportunity presented by the new semester to assess your use over the long run. What benefits does it provide you? How can you maximize those benefits, while reducing the costs? By becoming a conscientious user today, by finals at the end of the semester your brain will be better trained to focus and think richer thoughts.

Being deliberate about Facebook isn’t easy, but hopefully a few of these techniques, which helped me, will help you. Schedule a concentrated block of Facebook time rather than browsing whenever the urge strikes; this shifts Facebook into a hobby rather than a shameful timewasting habit. When you’re off Facebook, be off Facebook; avoid the siren call of a quick status post, “Studying sooooo hard at Butler!” Adjust your Facebook settings to reduce notification emails; it’s much harder to resist temptation when it thrusts itself into your inbox. Don’t use Facebook as a study break; it forces you into the skim-evaluate-quip mindset rather than read-analyze-write. Plus, just like potato chips, it’s awfully hard to limit it to “just 2 minutes.” Experiment with different ways to control your use, and see what works for you.

Facebook will one day be passé, but whatever replaces it will affect our cognition, just like speech, writing, email, and Facebook itself already have. By first understanding the medium, and then deliberately engaging with it, we can attempt to capture the benefits and avoid the harmful effects.

No Comments »

Sort with sleep

Posted: June 16th, 2011 | Author: | Filed under: Nerdery | Tags: , , | No Comments »

Inspired by an Ars thread that was inspired by a 4chan thread found on reddit, it’s an interesting sort idea for integers.

Basically, sort a list of integers by spawning a new thread or process for each element then sleep for the value of that element then print out that element. Here’s the original bash example, but I’d love to see other crazy languages.

#!/bin/bash
function f() {
    sleep "$1"
    echo "$1"
}
while [ -n "$1" ]
do
    f "$1" &
    shift
done
wait
No Comments »

PyPy testing

Posted: March 22nd, 2011 | Author: | Filed under: Nerdery | Tags: , , , , , | 4 Comments »

After reading Bob Ippolito’s excellent Playing with PyPy I was inspired to try PyPy out myself. I heard a ton of buzz coming out of PyCon that PyPy is wicked fast and wicked awesome. I wanted to take a look, and Bob’s instructions were a perfectly made intro.

A lot of the work I do is with strings (as you can see in my picloud testing from last year). I built a little test of PyPy vs Python2.6 vs Python 2.6 + Pyrex + C-Extension to see how things were going. After following the instructions I have PyPy 1.4.1, and OSX 10.6.6′s built in Python 2.6. My test case is pretty simple – compute the DoubleMetaphone representations of 94,293 names from the Census. First gather the data:

curl -O http://www.census.gov/genealogy/names/dist.all.last;
curl -O http://www.census.gov/genealogy/names/dist.female.first;
curl -O http://www.census.gov/genealogy/names/dist.male.first;

So, now we setup our test code. All it does is loop through those 3 files we just downloaded of names, grabs the name from each line, computes  the double metaphone values, and appends them to to a list.

I’m using two implementations of the DoubleMetaphone algorithm. First is Fuzzy, a library Jamie developed at Polimetrix that uses Pyrex to wrap the C implementation by Maurice Aubrey. The other version is Andrew Collin’s pure python one. For simplicity we’re going to call that atomodo.py after his domain.

pip install Fuzzy
curl http://www.atomodo.com/code/double-metaphone/metaphone.py/at_download/file > atomodo.py

My test.py:

import sys
 
if sys.argv[1] == 'atomodo':
	import atomodo
	dmeta = atomodo.dm
elif sys.argv[1] == 'fuzzy':
	import fuzzy
	dmeta = fuzzy.DMetaphone()
 
files = ['dist.all.last', 'dist.male.first', 'dist.female.first']
output = []
for file in files:
	fh = open(file)
	for row in fh:
		name = row[:15].strip()
		x = dmeta(name)
		output.append(x)

(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo

real	0m3.098s
user	0m3.034s
sys	0m0.055s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo # CPython

real	0m2.425s
user	0m2.390s
sys	0m0.032s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy

real	0m0.390s
user	0m0.357s
sys	0m0.032s

The results pretty well speak for themselves. C + Cython destroys the other two. Plain jane CPython is slightly faster than PyPy. Aside, but I ran all this with PYPY_GC_NURSERY=716K to help PyPy out. On my system that seemed like a sane default after running his script. I ran it with no PYPY_GC_NURSERY and the results were a bit slower across the board. In this case pypy was 3.180s without a GC_NURSERY value.

Total User Seconds (smaller is better)

 


I decided to play around a little further at this point, to see if PyPy’s JIT would do better with more iterations. I tried two variations with different results for PyPy. In Variation A I loop the entire thing 10 times, inserting the loop above output = [], so the list is reset each time. In other words this is a loose loop, it opens the files 10 times, etc. The results are pretty interesting!

(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo

real	0m19.907s
user	0m19.734s
sys	0m0.145s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo

real	0m24.615s
user	0m24.450s
sys	0m0.160s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy

real	0m3.753s
user	0m3.608s
sys	0m0.143s

Total User Seconds (smaller is better) Variation A

Variation B repeats just the double metaphone calculation 10 times, by wrapping x = dmeta(name). This does less work overall, because it doesn’t reopen the files, doesn’t have to iterate over them or substring + strip. PyPy does even better, comparatively.

(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time pypy test.py atomodo

real	0m16.610s
user	0m16.511s
sys	0m0.083s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py atomodo

real	0m23.929s
user	0m23.855s
sys	0m0.067s
(pypy-1.4.1-osx64)kotai:perftesting chmullig$ time python2.6 test.py fuzzy

real	0m2.526s
user	0m2.484s
sys	0m0.041s

Total User Seconds (smaller is better)+Variation+B


So where does that leave us? Well if things scale perfectly the original times * 10 should be about the same as Variation A, and Variation B should be a tiny bit smaller (because it’s doing less work). However reality is always more confusing than we’d hope.

Comparison: User Seconds (smaller is better)

CPython running atomodo is quite consistent. The CPython+fuzzy is pretty darn fast and consistent too, seemingly getting more of an advantage from B than CPython+Atomodo. PyPy is crazy though. I would expect A and B to be faster than the original because JIT can work its magic more. However I was surprised by how much, and further surprised by how much B was faster than A. I guess the cache is very short lived or something?

Admittedly this test is flawed in 200 different ways. However it’s interesting to see where PyPy might be faster (very, very, very repetitive code; one pass calls dmeta(name) 94,293 times). I also know I’ll keep looking for C extensions.

4 Comments »

Voter File Documentation Project?

Posted: March 12th, 2011 | Author: | Filed under: Nerdery, politics | Tags: , , | No Comments »

Political Data Nerds,

I’ve spent far, far too many hours of my life working with voter files. Every voter file sucks in its own unique way, and figuring out exactly how Montana sucks differently from Kansas is a unique and constant battle. Well, I’m tired of it! I don’t want to have to re-learn these challenges next time I work on a file, I don’t want to dig for the raw documentation (only to realize that it’s not always accurate).

I’m thinking of starting/contributing to a resource that consolidates documentation on all voter files out there. It wouldn’t be the data, it would just be freely available documentation to help anyone who’s already working with the data work with it more easily. What do you think?

I imagine it would have a list of vendors who provide these services as well, but the focus would be on helping anyone who’s trying to do it themselves. Probably should also have some more general tech recommendations, like how to concatenate files together, standardize addresses, geocode, etc.

Questions it would most definitely answer for every voter file (at least states, counties aren’t that important to me right now):

  • Where can I request this, and how much does it cost?
  • What format is it? CSV? Tab delimited text? Does it have a header? One file per county, or one per state?
  • How is vote history stored?
  • How do I translate from their geopolitical districts to something “standard?”
  • How do I translate from their counties to county FIPS codes?
  • What fields does it contain? Name? Address? Date of Birth? Phone? Party?
  • Mapping their vote history to some more global standard (for the common elections).

This wouldn’t be doing anything for you, but hopefully it would ease the pain of anyone having to work with raw voter files.

Pew’s Data for Democracy report is an excellent start. However it’s a static and higher level document. I’d like a living document that contains more concrete information, and can be easily updated.

Does anyone know of an existing project doing this that I could contribute to? If not, are there any platforms better than Mediawiki to use? I’d rather not spend a lot of time writing original code for this, but that might be inevitable if I don’t want to do a ton of copy/pasting in mediawiki (boy does Mediawiki suck, too)…

No Comments »

MITRE Challenge Graph

Posted: February 23rd, 2011 | Author: | Filed under: Nerdery | Tags: , , , , | 10 Comments »

For my own curiosity I created a python + R script to grab the MITRE leaderboard and graph it. It’s a bit of python to grab the leaderboard and write out some CSVs. Then a bit of R code (updated link: http://a.libpa.st/4KFGq) generates the graph. It’s running automatically with launchd on my laptop, and it should be regularly uploading a png to the address below. Launchd is pretty awesome, but a royal pain in the ass to get set up. It doesn’t feel very deterministic.

I still need to figure out how to jitter the names so they don’t overlap (like YouGov & Agent Smith), but other than that I thought it was a nifty little exercise.

Each line is a team, with their best MAP scores as datapoints

10 Comments »

MITRE Name Matching Challenge

Posted: February 17th, 2011 | Author: | Filed under: Nerdery | Tags: , , | No Comments »

My illustrious former colleague Ryan is now over at MITRE doing operations research and who knows what. He pointed me toward the MITRE Challenge.

The MITRE Challenge™ is an ongoing, open competition to encourage innovation in technologies of interest to the federal government. The current competition involves multicultural person name matching, a technology whose uses include vetting persons against a watchlist (for screening, credentialing, and other purposes) and merging or deduplication of records in databases. Person name matching can also be used to improve document searches, social network analysis, and other tasks in which the same person might be referred to by multiple versions or spellings of a name.

Basically they give you a small list of target names, and a ginormous list of candidate names, and for each target name you return up to 500 possible matches from the candidate name list. Currently the matching software we built at Polimetrix back in 2005-2007 is doing pretty well. It was designed for full voter records, but I broke out the name component by itself. The result is pretty awesome. Currently we’re ranked #1 at 72.038. Below us are a few teams, including Intaka at 68.801 and Beethoven at 58.501.

No Comments »

Stackoverflow overflow

Posted: February 9th, 2011 | Author: | Filed under: Nerdery | Tags: , | 3 Comments »

Recently I’ve gotten a bit obsessed with stackoverflow.com. It’s a programming Q&A site. You can ask questions, you can answer and comment on them. However they have a sick twist – people vote on everything. They vote on your questions, answers, comments. You earn reputation points when your content is voted up, and you lose points when it’s voted down. You also earn badges, like gaming achievements.

They’ve recently started a whole bunch of related sites under the stackexchange brand. Same model and software, but with different subjects. So far there are already more than I care to count with only very spurious differentiation, but a few highlights include gaming, cooking, english, programming (as a profession), power users, sysadmin, linux, ubuntu, and a lot more.

Here’s my badge of honor. Right now I have 674 rep and 10 badges on Stackoverflow, and 261/4 on gaming (plus ~100 on a bunch of the other sites, just for signing up). That’s my profile image, which should update automatically!

Stack Overflow profile for chmullig at Stack Overflow, Q&A for professional and enthusiast programmers

It’s amazing how satisfying and competitive the Q&A system ends up. I find myself less and less interested in any other medium for asking or answering questions like the kind on Stackoverflow. It’s slow and there’s no rep, what’s the point?

3 Comments »

Programming Challenges

Posted: November 11th, 2010 | Author: | Filed under: Nerdery | Tags: , , , | No Comments »

I’m a fan of puzzles, programming and learning, so I’ve always enjoyed The Python Challenges. Recently my coworkers Delia, Chris and I came up with the idea of doing some of those within the company to help ourselves and our coworkers become more familiar with Python and R (and to a lesser extent SQL and other languages).

The end result is the YG Challenge, where we’ll be posting a few problems a week in at least R & Python, then solving them. Week 1 is up, and we have some great ideas for the future. Intended for our coworkers, it’s public because why not! Feel free to take a stab at solving them, especially if you haven’t used either of those languages before.

No Comments »

pmxbot command: lunch!

Posted: May 14th, 2010 | Author: | Filed under: Nerdery | Tags: , | No Comments »

As everyone is well aware, lunch is the most important part of the work day. However it’s often hard to find inspiration when deciding on a delectable dining destination. The ideal solution is to have someone propose options, and everyone reject them until consensus is reached. However nobody enjoys that. Computers can propose options, but that’s less social.

pmxbot has an existing !lunch command that’s supposed to help. Unfortunately you have to fill out the dining list yourself, and frankly, that’s a pain. The result is lots of old, bad definitions in a few limited areas. It does let you sneak in some comedy options (What to have in Canton? PB&J? Leftovers?), but for the main purpose it kinda sucks.

The solution is to use someone else’s database. I cooked one up yesterday pretty quickly using Yahoo Local’s API and the pYsearch convenience module. The result is quite easy, really. The only wrinkle is you need that module (someone could rewrite it to use just urllib and simplejson, if they cared) and a Yahoo API key.

The code is below, also available at http://libpa.st/2K0kh.

@command("lunch", doc="Find a random neary restaurant for lunch using Yahoo Local. Defaults to 1 mile radius, but append Xmi to the end to change the radius.")
def lunch(client, event, channel, nick, rest):
        from yahoo.search.local import LocalSearch
        location = rest.strip()
        if location.endswith('mi'):
                radius, location = ''.join(reversed(location)).split(' ', 1)
                location = ''.join(reversed(location))
                radius = ''.join(reversed(radius))
                radius = float(radius.replace('mi', ''))
        else:
                radius = 1
        srch = LocalSearch(app_id=yahooid, category=96926236, results=20, query="lunch", location=location, radius=radius)
        res = srch.parse_results()
        max = res.totalResultsAvailable if res.totalResultsAvailable < 250 else 250
        num = random.randint(1, max) - 1
        if num < 19:
                choice = res.results[num]
        else:
                srch = LocalSearch(app_id=yahooid, category=96926236, results=20, query="lunch", location=location, start=num)
                res = srch.parse_results()
                choice = res.results[0]
        return '%s @ %s - %s' % (choice['Title'], choice['Address'], choice['Url'])
No Comments »