deriv LSK ETT STT aSTA ALPH OLDHOMEPAGE NEWHOMEPAGE

@why unicode sucks and why sanskritists should not touch it even with a ten meter long pole

According to ancient indian grammarians, the word azvaH अश्वः as made up of five small pieces, a z श् v व् a H . In certain circumstances the last two pieces must be replaced with A . To do that you remove the last H , remove the last a , and add A .

Modern computer experts have not yet reached an analysis that advanced.

According to the UNICODE consortium, I am supposed to encode into a computer the word azvaH अश्वः as a sequence of five unicode characters, which are —

0905 DEVANAGARI LETTER A

0936 DEVANAGARI LETTER SHA

094d DEVANAGARI SIGN VIRAMA

0935 DEVANAGARI LETTER VA

0903 DEVANAGARI SIGN VISARGA

And if i want to turn azvaH अश्वः into azvA अश्वा, I have to remove the last DEVANAGARI SIGN VISARGA, and if what is before it is a consonant with short a like VA, then I add 093E DEVANAGARI VOWEL SIGN AA, but if it is anything else, then I add 0906 DEVANAGARI LETTER AA. To top this, the five UNICODE characters of azvaH अश्वः will usually travel over the WWW encoded as UTF-8, namely as

e0 a4 85 e0 a4 b6 e0 a5 8d e0 a4 b5 e0 a4 83

While the UTF-8 encoding of the roman letters "azvaH" is

61 7a 76 61 72

Which means that the UNICODE version of the /mahAbhArata is three times heavier than the Roman version.

So, UNICODE sucks.

If you aren't persuaded yet, read this —

Linguistic Issues in Encoding Sanskrit.