Well since I’ve been cited (not in the police sense…) I should provide links, but more importantly I have to strongly back Greg’s lesson here. The lesson is simple, there are more characters that you need to support right now, that are not in the ASCII range, the so-called “Funny Characters” and that includes any currency indicators (£,€, etc).

The number of times I have seen that annoying character:


usually in front of a character that is in the upper ranges of ASCII:


Better still is the little square that says the character isn’t known/rendered, probably due to an unsupported accent. This suggests the lesson isn’t being learned.

People, it’s simple, it isn’t 1980 anymore and not everyone speaks or even writes in English or any other Latin language. The fact is ASCII doesn’t even cope with any other Latin or Germanic based language. Let it go, it’s over, there are a few things that are 7-bit safe and few of those are actually conveying meaningful information to be read by humans without being decoded.

Modern programming languages support Unicode natively, and your decisions are going to be based around which of the UTF encoding standards you might follow and how you determine the the lexicographic ordering for your local languages.

But please; for the love of all sanity, if there is one point Greg and I are trying to make, it is this; make this one decision right now before creating any code: