Friday, March 13, 2009

I can eat glass but it hurts me.

Turns out there's more hype than reality to claims that unicode is fully supported in the current operating systems. We looked at OSX and Windows 2000 and the support was frustratingly close, but had issues. Looks like some of it has to do with default settings, but there were also some experiments we did that convinced me it's not so simple. We were looking into it because we were nervous when our DBAs mentioned going 100% unicode (might have just been a trial balloon on their part).

I can't blame any software company for not going fully unicode. After working with it for a few days and doing some experiments, I'm convinced of a PERL article by Gudio Flohr that said (my emphasis), "UTF-8 is fast to write but hard to read for applications. It is therefore not the worst for internal string representation but not far from that."

Oh and if you're not familiar with eating glass (and hence the title of this post), it's the string used to test for unicode compatibility, sort of like "lorem ipsum dolor sit amet" as placeholder text. If you ever try to support unicode, you'll understand why it's like trying to eat glass and not have it hurt you.

No comments: