Japanese characters on OO.o presentation--> i18n
Robert La Ferla
robertlaferla at comcast.net
Fri Mar 17 01:01:09 EST 2006
Nicholas Bodley wrote:
> Imho, read and heed! I didn't know that. I'm extremely unlikely to
> send e-mail in Japanese, but it's one of those essentials (like
> knowledge of BCC) one really has to keep in mind when sending e-mail.
>
> As I understand it, (and I might well be wrong! Corrections welcome!)
> there are at least two basically-different ways to encode Japanese
> text; iirc, one (Shift-JIS? Apologies if I'm wrong) is something like
> the old {ltrs}/{figs} shift in 5-bit teleprinters -- one can be in the
> wrong mode. The consequence is that if a "mode-change" character is
> omitted, or wrongly sent when it should not be, (or munged...), all
> subsequent text (at least up to a redefining of "mode") is scrambled
> badly. If you think seeing English text in {figs} shift is bad, when
> you have a practical set of something like 2,300 or so
> basically-Chinese characters, and are receiving nonsense, as I
> understand it, that's mojibake.
There are actually several different encodings. Shift_JIS (Microsoft
SJIS) is primarily used for web pages and other documents on Windows
systems. ISO-2022-JP is used for e-mail. There's also EUC-JP (Extended
Unix Code) which used on Unix systems. Universal encodings like UTF-8
and UTF-16 are also used. I am a big fan of UTF-8 because it supports
multiple languages (East Asian, Arabic, Hebrew, Thai, English, etc...)
and efficiently handles ASCII (as single bytes.)
A great resource on this subject is the book, "CJKV Information
Processing" by Ken Lunde.
> [Katakana]
>
> One can read more Japanese than one might, at first, expect. Japan has
> imported English words "wholesale", sometimes adapting them to their
> own language (I'm typing on a Compaq "pasokon" -- pasonaru
> konpyuutaa). Perhaps 35,000 words have been imported. These words are
> rendered/written with a simple syllabary called katakana, which
> (except for arbitrary-seeming, never-complicated character shapes) is
> about as easy to learn* as an alphabet, and can be a *lot* of fun.
Both Katakana (foreign words) and Hiragana (native Japanese words) are
phonetic so they are easy to learn. Kanji is also interesting but to be
literate you need to learn a few thousand characters which is quite a task.
More information about the Discuss
mailing list