[TriLUG] Email: Character Encoding

Shane O. shaneodonnell at gmail.com
Tue Jul 31 18:07:23 EDT 2012


Cristóbal -

Unrelated to encoding, but related to mangled names due to poor technology
decisions...

I *used to* subscribe to a trade magazine (which shall remain nameless)
that gave me the option of receiving my email in a text-only format.  Good
on them for that.

However, on about a quarterly cycle, you could watch my name grow in the
salutation.  The first email would open with:

   Dear Shane O\'Donnell:

then the next would read:

   Dear Shane O\\'Donnell:

until a few weeks later...

   Dear Shane O\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\'Donnell:

That, my friends, is some serious escape artistry.

As you were -

Shane O.

On Mon, Jul 30, 2012 at 6:54 PM, Cristóbal Palmer <cristobalpalmer at gmail.com
> wrote:

> Alan (and other list members),
>
> Jack has already covered the most sane (if oversimplified) answer for
> most cases (ie. use utf8 everywhere), but I thought I'd chime in with
> a little game I like to call Figure Out How My Name Got Abused. If
> stuff looks profoundly weird because you're reading this in the digest
> and our digest system is idiotic and execrable, I suggest you look in
> the web archive (link in the footer), since that seems to do the nice
> thing. But back to the game. The easiest/quickest way for me to play
> the game is with a python shell. We'll start by creating a unicode
> object that's just the accented o (aka. latin small letter o with
> acute, aka. U+00F3, aka. unicode codepoint 00F3). We'll then try to
> reproduce some of the ways I've actually seen my name show up in
> correspondence (usually from automated systems, but sometimes from
> human interlocutors using software that didn't do the
> nice/ideal/correct thing).
>
> $ python
> Python 2.7.2 (default, Feb 6 2012, 21:42:35)
> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)] on
> darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> S = u'ó'
> >>> type(S)
> <type 'unicode'>
> >>> print S # bad idea generally, but here gives the correct small o with
> acute
> ó
> >>> print S.encode('iso-8859-1') # question mark
> ?
> >>> print S.encode('utf8').decode('iso-8859-1') # capital A with a tilde
> to the third power
> ó
> >>> print S.encode('utf8').decode('windows-1252') # capital A with a tilde
> to the third power
> ó
> >>> print S.encode('utf8').decode('cp500') # capital C followed by an
> interpunct, only because EBDIC is LOL
>> >>> print S.encode('utf8').decode('mac_cyrillic') # square root of greater
> than or equal to
> ó
> >>> print S.encode('utf8').decode('koi8_r') # Tse followed by Yo
> цЁ
> >>> print S.encode('utf8').decode('shift_jis') # halfwidth katakana letter
> te followed by halfwidth katakana letter u
> テウ
> >>> S.encode('utf8').decode('quoted-printable')
> '\xc3\xb3'
>
> My personal favorite is the mac_cyrillic one. Other ways I've seen my
> name barfed up include:
>
> Crist??bal
> Cristbal
> Crist__bal
> Crist�_bal
> Crist��bal
> CristÌ_bal
> CristÃ_bal
>
> See if you can produce some of these with your own version of my game.
>
> Let it be known that if I interact with you or your business, and it's
> clear to me that your webapp/business/whatever has been modified since
> 2010, and my name comes back from your webapp/business/whatever as,
> for example, capital A with a tilde to the third power, I will
> consider you and/or your business idiotic and execrable. There's just
> no excuse for this anymore. I could have made excuses in 2009. They
> would have been bad excuses, but I could have made them. I just can't
> make them today. And in case you think I'm being mean, Joel Spolsky
> was angry back in 2003.
>
> Further Reading:
>
> * http://www.joelonsoftware.com/articles/Unicode.html
> * http://farmdev.com/talks/unicode/
> * http://docs.python.org/library/codecs.html
> * http://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings
> * http://www.columbia.edu/~fdc/utf8/ (with special thanks to Kevin Otte)
>
> Thanks,
> --
> Cristóbal Palmer
> cmpalmer.org
> --
> This message was sent to: Shane O. <shaneodonnell at gmail.com>
> To unsubscribe, send a blank message to trilug-leave at trilug.org from that
> address.
> TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug
> Unsubscribe or edit options on the web  :
> http://www.trilug.org/mailman/options/trilug/shaneodonnell%40gmail.com
> TriLUG FAQ          :
> http://www.trilug.org/wiki/Frequently_Asked_Questions
>



More information about the TriLUG mailing list