Docstring of the day: a mystery

I like to think I’m pretty good about writing docstrings that will be comprehensible to my successor when I leave a job or get hit by a bus, but sometimes I get a bit silly, and write riddles like…

	"""Return resumes and Bjorks and pina coladas.
	"""

Here’s the code to go along with it:

	def mystery(s):
	    """Return resumes and Bjorks and pina coladas.
	    """
	    try:
	        return s.decode('ascii')
	    except UnicodeDecodeError:
	        s = s.decode('latin1')
	        normalized = unicodedata.normalize('NFKD', s)
	        return ''.join(char for char in normalized if ord(char) < 128)

If you guessed that this code strips diacritics from the given string, giving the closest possible ASCII representation of any foreign characters, then congratulations you clever so-and-so. I was inspired by a similiar routine in Google Refine. They use a sort of homebaked lookup table, rather than the pithier approach of unicode equivalence - schmucks! It turns out that normalizing foreign characters is a helpful step in clustering swathes of messy free text (something I’m doing a lot of right now at Sciencescape).