i18n
Jerry Feldman
gaf at blu.org
Fri Mar 17 13:25:25 EST 2006
On Friday 17 March 2006 12:52 pm, David Hummel wrote:
> Jerry's statement above is misleading. It's not really a problem, since
> the user interacts not with the kernel, but with applications. glibc
> has had support for utf-8 and multi-byte locales for years now (since
> 2.2 I believe).
I specifically noted that Linux is still 8-bit and that it was based on the
C language where the standard char data type is 8-bits. You are absolutely
correct about applications. As I mentioned WRT lint(1), back in the early
days of OSF1, we did not use printf, but we use a message catalog that had
its own printing functions that were set up for the wider character sets.
The fact is that Unix and Linux are still based on character strings that
are composed of 8-bit characters. Certainly, most of the standard C
functions, such as printf(3) use locales. The local not only contains the
character sets, but also contains information such as how to format numbers
and dates. And even the character sets themselves contain information on
how they are sorted. In the original C language, there were functions in
the ctype.h, such as isupper(), islower() et al. Before C89, these were
generally all simple macros. For instance, upper case A-Z is in the range
of 65 (0x41) through 90 (0x5A) and the lower case are in the range between
97 (0x61) through 122 (0x7A). To convert from upper to lower, all you
needed to do was to or in 0x20 and to go the other way you'd mask out that
bit. But, once locales were supported, ctype.h had to resort to a set of
tables. This is the type of thing that had to be done in glibc to implement
locales. (Note that I am talking about locales and not specifically about
UTF-8 or UTF-16 in this case).
But it is always up to the application developer. A well written application
should be able to utilize
--
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9
More information about the Discuss
mailing list