All Your Character Encoding Belong to Us, Western Europe

Listen, Western Europe…North America has a great deal to thank you for. You are, of course, our big brother…and we’re really glad that you led the way. However, we’re a little ahead of you on certain things, and we need to get something straight: older doesn’t necessarily mean better. And when it comes to character sets for languages, you’re really starting to show your age. Much like the show Hoarders, I know that you have a hard time letting go of your “treasures”…but, really, it’s for the best. Now, I’m not suggesting that certain garbage characters be dropped entirely; I’m sure that your ministers of culture are already seething at the mere mention of it. Of course, you can use them without reservation in your daily lives for useless fun. (Like calligraphy!) For the sake of computing, however, it’d be better for all software developers (and everyone in general) if you just converted all electronic text to ASCII…and I’m willing to bet some of my Western Europe counterparts would agree with me.

Imagine a world where we wouldn’t have to ask “Is this character in encoding Windows-1252 but not in encoding ISO-8859-1?” That place could be real! Just think about that for a moment…it would be beautiful, wouldn’t it? Together, we can make that happen. So, let’s take that initial step of purging your electronic banks of filthy, superfluous text. Much like Hoarders, we need to step through each “treasure” and mull over its inherent worthlessness…I mean, “value”…to your respective languages:

Ligatures – Come on…we both know that they’re stupid. Æ and Œ were created by drunk Trappist monks, when they were sloppily copying books with cramped hands. Afterwards, when someone pointed out their mistakes, they said “Uh, no, I meant to do that. Those are real letters.” And thus ligatures were born. Even more and more of your populations have stopped using it, and they split them into “AE” and “OE”. You know that it’s dying. Drop them.

Umlauts – We’re not even going to address diacritics for now and how everyone should just use normal tittles. I said tittles! Stop laughing! In any case, we’ll address that diacritic bullshit some other day…First, I have most encodings on my side: they don’t distinguish between umlauts and diaeresi. (Even Unicode is like “I don’t give a shit.”) Second, each country has its own version of them! You formed the EU for a reason…start standardizing that shit! Three, I understand the purpose of umlauts: to help a nascent reader with the pronouncing of vowels that are next to each other…but in North America, we eventually get it after some direction and practice. We don’t need diagrams and pie charts in order to learn how to pronounce “cooperate” properly. Trust me…you can, too.

Cedilla – Even though I’m against diaeresi, I understand that they’re useful, especially to novice readers. But a cedilla? You can’t just replace “ç” with a “sc”, “ts”, or even a “z”? Because it’s so special, of course. Did an ancient Spanish or French monarch allow their child to doodle while writing…and then it was decreed that the doodle was official henceforth? (Personally, since I love the movie Aliens, I’ve always liked to doodle little snapping jaws shooting out of everything. If it hadn’t been invented in the Middle Ages, I probably would have invented the cedilla…but if would have looked different.) It doesn’t matter. We all know how useless monarchs are…that, and how they generally look better without heads. (That’s a tip of the hat to you, Frenchies.) In any case, like monarchs, the cedilla is a complete waste. Into the trashcan.

You see how easy that was, my brethren in Western Europe? We got rid of three different types of characters…in one shot! That’s at least dozens (if not hundreds) of characters that we can free from our collective banks of data! By using just ASCII, we can use half of the data needed, which means almost half of the processing power. Think about how many baby seals that could save…and you don’t to kill baby seals, do you? I’m sure that John Lennon would have written an extra verse about this very subject in “Imagine”, if he had just lived long enough. It might take some time, but eventually, we can live free of an Encoding Hell. Leave Unicode to the rest of the world and its craziness…and join hands with us, so we can finally blow that bridge together. ASCII and freedom for all! You just have to believe, Western Europe. Just believe.

2 thoughts on “All Your Character Encoding Belong to Us, Western Europe”

Nikola on September 6, 2014 at 12:02 pm said:

That’s very ignorant and frustrating actually. I understand the usefulness of what he’s suggesting, but he’s basically asking us to reject parts of our languages. In German the umlauts create new letters. Therefore, to a non-speaker a U and a Ü might look and sound similar, but to anybody who speaks the language they are clearly different letters and especially sounds.
In South Slavic languages you have letters such as Ć and Đ, which are clearly just accented Cs and Ds, right? No, they’re completely different letters and meanings of certain words change completely when you drop the glyphs.
Most people online will swap a Đ for a Dj because that’s the linguistic origin, but what happens with the words that actually do have a Dj next to one another? How do you tell the difference?
Now in some cases it’s just an accent, and dropping it is fine. In most cases native speakers know to change a certain letter in their head because otherwise the word makes no sense. However, try reading an English text and replacing all Vs for Ws. That’s how it feels.
These letters to exist for a reason, and I think it’s very “English-centric” to drop certain letters because “they are redundant”. Why not drop the X. I’m sure we’d have no problem replacing them with Ks. Arguabely Y is way more replaceable in English then Š is in the South Slavic languages.
In the end, I’m sure this whole thing isn’t half as much of a problem for you guys in the US where you can just say “we’re using the English alphabet on this site, please stick to it”, whereas that’s impossible for us that have to develop things for European markets. I hope you understand why that article is so poorly thought out.

Reply ↓
- Peter Bolton on September 11, 2014 at 11:51 am said:
  
  First, I do appreciate your verbose and well-thought answer. I appreciate someone who presents any sort of informative argument.
  Two, even though the piece may seem to be completely serious, I can assure you that it isn’t the case. This whole blog is my avenue for writing sarcasm and parody, and this piece is written with a certain amount of jest.
  
  True, like many other software developers, there’s a certain amount of ennui which is invoked when dealing with Unicode and various other encodings, and I’d be telling lies if the sentiments of this article didn’t blaze in my mind at the climax of frustration. In the end, though, I understand that languages are tied to cultures, and one simply cannot (and should not) expect a culture to simply disappear…though, in the midst of resolving character issues with data and tearing your hair out, you might wish for it. 🙂
  
  Reply ↓

I Hate the Sounds around Me

Somebody please make them stop

All Your Character Encoding Belong to Us, Western Europe

2 thoughts on “All Your Character Encoding Belong to Us, Western Europe”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “All Your Character Encoding Belong to Us, Western Europe”

Leave a comment Cancel reply