Re: [m17n] The dead '~' is evil!

21 Sep 2002

      "Steven T. Hatton" <hattons@speakeasy.net> writes:
...
Is ∑ the same as Σ? I used `ucm` to pick these characters out.
This is an unusably slow process for any kind of work which involves
extensive use of characters not available in the current key
mapping.
Probably you only need a rather small subset of Unicode frequently.
Put the characters you need often into a file, then display that file
(e.g. with 'less' in a UTF-8 capable terminal) and cut & paste from
there. That's faster than 'ucm' for that purpose, because you have all
your frequently used characters close together.
...
The first '∑' is from 'U+2200' and the second 'Σ' is from
U+0300. These appear identical on my SuSE 8.0 box in kmail.
Depends on the font you have setup in KMail.  When using the GNU
Unicode font for example, there is a small but visible difference
between the glyphs for these two characters. In the efont-unicode
fonts and Markus Kuhns 18 pixel unicode font (which comes with
XFree86) the difference is obvious.
...
I copied these from kmail to and Emacs buffer and found the first of
these characters is not rendered, and the second is rendered as
expected.
Both XEmacs and Emacs display both characters correctly for me.  Even
when I don't load my ~/.emacs and use the system default ('xemacs -q'
or 'emacs -q').
...
Hexlifying the buffer holding `∑ Σ' resulted in the following
character codes: e288 9120 cea3.  This could be anoying when it
comes to human to human comunication.
Yes, of course the two characters are different.
...
It is potentially devistating when it comes to human to computer
communication.  For example, imagine a database of words from
different languages and data entered by different users who don't
fully understand the idosynchracies of UTF encoding.
You must understand which character they want to input and use the
correct one:

    Character `∑' UNIDATA information.
    ---------------------------------
     This is converted to U+2211
    under the current environment.

        name			N-ARY SUMMATION
        category			(symbol math)
        combining-class		0 => Spacing
        bidirectional-category	ON => Other Neutrals
        mirrored			mirrored
        titlecase-mapping		-1

    Character `Σ' UNIDATA information.
    ---------------------------------
     This is converted to U+03A3
    under the current environment.

        name			GREEK CAPITAL LETTER SIGMA
        category			(letter uppercase)
        combining-class		0 => Spacing
        bidirectional-category	L => Left-to-Right
        mirrored			not-mirrored
        lowercase-mapping		-1
        titlecase-mapping		-1

You see, one is a mathematical symbol, the other is a Greek
character. Just use the correct one. That's the same as with 'O' and
'0'. They may look similar in some fonts, that doesn't mean you are
allowed to mix them up. That can't be helped.
...
To the users, everything may appear correct, but what were intended
to be equivalent strings entered by different users may actually be
two distinct representations of the same human readable
representation.
Looks like you have not yet discovered combining characters:

For example, you can write a ö in different ways as well:

   U+00F6  LATIN SMALL LETTER O WITH DIAERESIS

or

   U+0308  COMBINING DIAERESIS

followed by

   U+006F  LATIN SMALL LETTER O

Try to paste those characters for example from 'ucm' into an xterm in
UTF-8 mode. You see that the result looks identical in both cases.
Nevertheless the UTF-8 sequence in the command line in the xterm is
different.

-- 
Mike Fabian   <mfabian@suse.de>   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。