The dead '~' is evil!

Steven T. Hatton

18 Sep 2002 18 Sep '02

22:44

I just learned what dead keys really are. The file /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose defines the key combinations necessary to type 'special'characters using the en_US.UTF-8 locale. 'Why?', you may ask is the dead '~' especially evil? in a word `rm -r ~/.junk`. If you are in the habbit of typing <dead_tilde> <space> and switch to a mapping without dead keys `rm -r ~/.junk' becomes `rm -r ~ /.junk' which has a whole different meaning. In preparing this message I did a `grep "~" /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose' and found there is also a "<dead_tilde> <dead_tilde> : "~" asciitilde" mapping. This is much better. It could still make working with '~' escapes in ssh rather confusing, but I see that as a far less likely problem. In general, now that I understand what ḋead keý means, I donṫ like them. Thereś got to be a better way! I have many uses for special characters. They range from transscribing the Nibelungenlied into electronic text, to building databases representing, for example, Old Norse words and their inflections, or proto-Germanic and its relationship to its daughter languages. I also would like to be able to express mathematical concepts such as ∃,∄, ⋂, ⋃, ⋀, ∩, ∪, ∬. In a semi-natural way. And then we hit the obvious problem of which of ⋂ and ∩ is the correct symbol for the instance in question. Is ∑ the same as Σ? I used `ucm` to pick these characters out. This is an unusably slow process for any kind of work which involves extensive use of characters not available in the current key mapping. The first '∑' is from 'U+2200' and the second 'Σ' is from U+0300. These appear identical on my SuSE 8.0 box in kmail. I copied these from kmail to and Emacs buffer and found the first of these characters is not rendered, and the second is rendered as expected. Hexlifying the buffer holding `∑ Σ' resulted in the following character codes: e288 9120 cea3. This could be anoying when it comes to human to human comunication. It is potentially devistating when it comes to human to computer communication. For example, imagine a database of words from different languages and data entered by different users who donṫ fully understand the idosynchracies of UTF encoding. To the users, everything may appear correct, but what were intended to be equivalent strings entered by different users may actually be two distinct representations of the same human readable representation. I'm sure much of what I'm saying is not news to people on this list. I do have one proposal for a tool which may help alleviate some of these problems. It would be nice to have a speciall key sequence to switch quickly between key mappings (I believe the KDE actually does support this.), and have a visual representation of the current mapping displayed to the user at will. It might even be helpfull to be able to browse the available mappings with the corresponding visual representations displayed to the user before selecting that particular option. As it stands, I don't even know if I have a 104 or a 105 keyboard. I don't know the difference. Things have gotten better regarding selecting keyboards and such with the KDE. I still find them confusing. A user sould not have to learn the underlying details of xmodemappings and the ~/.kde/share/config/kxkbrc. Any thoughts on this? -- STH

Show replies by date

Mike Fabian

20 Sep 20 Sep

23:54

New subject: [m17n] The dead '~' is evil!

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...

'Why?', you may ask is the dead '~' especially evil? in a word `rm -r ~/.junk`. If you are in the habbit of typing <dead_tilde> <space> and switch to a mapping without dead keys `rm -r ~/.junk' becomes `rm -r ~ /.junk' which has a whole different meaning.

Of course you get problems if you are used to one keyboard layout and switch to a different one you are not used to. Decide which keyboard layout you like best, then stick to that and stop switching. -- Mike Fabian <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。

Mike Fabian

21 Sep 21 Sep

00:47

New subject: [m17n] The dead '~' is evil!

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...

Is ∑ the same as Σ? I used `ucm` to pick these characters out. This is an unusably slow process for any kind of work which involves extensive use of characters not available in the current key mapping.

Probably you only need a rather small subset of Unicode frequently. Put the characters you need often into a file, then display that file (e.g. with 'less' in a UTF-8 capable terminal) and cut & paste from there. That's faster than 'ucm' for that purpose, because you have all your frequently used characters close together.

...

The first '∑' is from 'U+2200' and the second 'Σ' is from U+0300. These appear identical on my SuSE 8.0 box in kmail.

Depends on the font you have setup in KMail. When using the GNU Unicode font for example, there is a small but visible difference between the glyphs for these two characters. In the efont-unicode fonts and Markus Kuhns 18 pixel unicode font (which comes with XFree86) the difference is obvious.

...

I copied these from kmail to and Emacs buffer and found the first of these characters is not rendered, and the second is rendered as expected.

Both XEmacs and Emacs display both characters correctly for me. Even when I don't load my ~/.emacs and use the system default ('xemacs -q' or 'emacs -q').

...

Hexlifying the buffer holding `∑ Σ' resulted in the following character codes: e288 9120 cea3. This could be anoying when it comes to human to human comunication.

Yes, of course the two characters are different.

...

It is potentially devistating when it comes to human to computer communication. For example, imagine a database of words from different languages and data entered by different users who don't fully understand the idosynchracies of UTF encoding.

You must understand which character they want to input and use the correct one: Character `∑' UNIDATA information. --------------------------------- This is converted to U+2211 under the current environment. name N-ARY SUMMATION category (symbol math) combining-class 0 => Spacing bidirectional-category ON => Other Neutrals mirrored mirrored titlecase-mapping -1 Character `Σ' UNIDATA information. --------------------------------- This is converted to U+03A3 under the current environment. name GREEK CAPITAL LETTER SIGMA category (letter uppercase) combining-class 0 => Spacing bidirectional-category L => Left-to-Right mirrored not-mirrored lowercase-mapping -1 titlecase-mapping -1 You see, one is a mathematical symbol, the other is a Greek character. Just use the correct one. That's the same as with 'O' and '0'. They may look similar in some fonts, that doesn't mean you are allowed to mix them up. That can't be helped.

...

To the users, everything may appear correct, but what were intended to be equivalent strings entered by different users may actually be two distinct representations of the same human readable representation.

Looks like you have not yet discovered combining characters: For example, you can write a ö in different ways as well: U+00F6 LATIN SMALL LETTER O WITH DIAERESIS or U+0308 COMBINING DIAERESIS followed by U+006F LATIN SMALL LETTER O Try to paste those characters for example from 'ucm' into an xterm in UTF-8 mode. You see that the result looks identical in both cases. Nevertheless the UTF-8 sequence in the command line in the xterm is different. -- Mike Fabian <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。

Mike Fabian

01:14

New subject: [m17n] The dead '~' is evil!

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...

It would be nice to have a speciall key sequence to switch quickly between key mappings (I believe the KDE actually does support this.),

Yes. But I don't think that switching keyboard layouts all the time is a sensible solution if you need to input many different characters. You will never be able to type fast, because you can't get use to all keyboard layouts at the same time. I believe it is better for typing speed to learn touch typing for *one* keyboard layout and enter the other characters with some input method. For example, I always use the US keyboard layout and usually type German in XEmacs in iso-accents-mode, i.e. I type "o and XEmacs converts that to ö. Similar to dead keys. You can also use compose for that. For example, if the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose contains <Multi_key> <o> <colon> : "ö" odiaeresis and you have mapped some key to Multi_key, you can use that to input unusual characters. -- Mike Fabian <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。

Steven T. Hatton

10:04

New subject: [m17n] The dead '~' is evil!

On Friday 20 September 2002 21:14, Mike Fabian wrote:

...

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...
It would be nice to have a speciall key sequence to switch quickly between key mappings (I believe the KDE actually does support this.),

Yes.

But I don't think that switching keyboard layouts all the time is a sensible solution if you need to input many different characters. You will never be able to type fast, because you can't get use to all keyboard layouts at the same time.

I believe it is better for typing speed to learn touch typing for *one* keyboard layout and enter the other characters with some input method.

But I will never knowingly allow a person who uses dead keys to administer any of my systems. My point WRT the evil dead '~', comment was that a person accustomed to that sequence could be very dangerous on a system which has no dead keys. What happens when, for some reason, an install of new patches changes his keyboard to 'no dead keys' without his realizing it?

...

For example, I always use the US keyboard layout and usually type German in XEmacs in iso-accents-mode, i.e. I type "o and XEmacs converts that to ö. Similar to dead keys. You can also use compose for that. For example, if the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose contains

<Multi_key> <o> <colon> : "ö" odiaeresis

and you have mapped some key to Multi_key, you can use that to input unusual characters.

Here's an example of the kinds of things I would like to be able to do. I often want to explore the meaning of words with others on mailing lists. One excelent resource is here: http://www.bartleby.com/61/roots/IE422.html Notice the several non-standrd characters which are represented using <IMG/> tags. I would like to be able to enter such characters without having to browse through picklists, then copy and paste. Having a 'cheat sheet' available would lessen the burden. It may be in intrem solution until a 'correct' solution somes along. I do use the compose for Á,Ó,Ú,Đ,Þ.Ö, etc. When I looked through the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose file I saw that many of the characters requier dead keys. That's what got me talking about the dead '~'. I'm certain it is an omen that shortly thereafter, I learned that eshell thinks "~" == ~. I had put ~/something in a ./configure option and it missunderstood it to think I meant '~' and created $PWD/~ . Well I did rm -r "~". I'm still in shock. :-o I guess I don't need the dead key combos for much, righ now. What would be nice is some kind of escape sequence which would allow me to enter <super-escape-key-combo>[unicode hex representation] and out comes the exact character I'm looking for. Another option might be to hijack the number pad for custom key mappings. Have some way of putting the keyboard input mechanism into a special mode where the number pad keystrokes result in entering characters from your customized mapping. I envision this having a the following features: * a configuration mode. ** Some speciall key sequence would put the keyboard into pallet configuration mode . (or clicking an icon, etc) ** The user would have a display of all the UTF characters with the associated UTF hex codes. UCM seems to work for this, (but it's ugly as raw X). ** The user would have a pallet representing the number pad in which to copy the chosen character. Alternatively, the user could enter the exact hex representation of the character into a 4-character field. ** This should probably have a means of constructing multiple pallets, and a means of switching between them. * a use mode. ** Some special key sequence could be used to put the keyboard into pallet use mode. **Some kind of obnoxious indication that it is in pallet mode, - at least for newbies - with a message telling the user how to turn it off. **A means of switching between pallets. **A visual representation of the active pallet. (A diagram which looks like the number pad) **This should have a way of shifting case where the default would be 'natural'. A brief scan of the Unicode character sets indicates to me lower case is distinguished form upper case by setting the LSB. Of course, the original ASCII breaks this. In general a more user friendly means of learning the the compose keystrokes would be nice. My current KDE installation has a 'help' button in the 'Keyboard - KDE Control Module' which does absolutely nothing. This may be a result of my having installed the latest KDE rpms off the SuSE ftp server. Thngs don't always work on the bleeding edge. Nonetheless, I have found learning how to use the various keyboard configurations to be quite difficult. I'm finally catching on. I believe the combination which is right for me is: hattons@ljosalfr:~> locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=POSIX LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Geeko Cogwheel -> Preferences -> Peripherals -> Keyboard -> Enable keyboard layouts=t Keyboard Model = Generic 104-key PC Primary Layout = [en-US Flag] U.S. English w/ ISO9995-3 Primary Variant = basic But I have no idea what these settings really mean. It's probably not that hard to understand, but without any documentation, it is not very easy to learn. One thing I don't really understand is why not simply have some kind of escape key for dead keys. Or is that what [M ~] really is? STH --

Mike Fabian

23 Sep 23 Sep

16:16

New subject: [m17n] The dead '~' is evil!

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...

On Friday 20 September 2002 21:14, Mike Fabian wrote:

[...]

...

But I will never knowingly allow a person who uses dead keys to administer any of my systems. My point WRT the evil dead '~', comment was that a person accustomed to that sequence could be very dangerous on a system which has no dead keys. What happens when, for some reason, an install of new patches changes his keyboard to 'no dead keys' without his realizing it?

New patches won't change /etc/X11/XF86Config. And the problem you describe doesn't happen only when switching between "no dead keys" <-> "dead keys". It will also happen when switching from US keyboard layout to German, French or Japanese keyboard layout. Suddenly many keys, including characters interpreted specially by the shells like '~' and '*' will be on different keys. Of course this can have surprising effects when one doesn't notice. Sometimes I switch to Japanese layout for testing. Then I may get interrupted by somebody and forget to switch back to US keyboard layout when I'm back to my computer. And then, when I start typing with full speed, lots of surprising effects happen ... until I notice that the keyboard layout is wrong. It's especially bad in programs where each single keystroke is already bound to some action. That's why changing keyboard layouts all the time drives me nuts. I don't see any significant difference between this and you dead keys <-> no dead keys. Typing in haste on a unknown keyboard layout will cause problems. You may delete important files or e-mail accidentally. I.e. you *must* take care when switching keyboard layouts. I believe nothing can be done against that. -- Mike Fabian <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。

Mike Fabian

18:57

New subject: [m17n] The dead '~' is evil!

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...

On Friday 20 September 2002 21:14, Mike Fabian wrote:

[...]

...

I would like to be able to enter such characters without having to browse through picklists, then copy and paste. Having a 'cheat sheet' available would lessen the burden. It may be in intrem solution until a 'correct' solution somes along.

I do use the compose for Á,Ó,Ú,Đ,Þ.Ö, etc. When I looked through the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose file I saw that many of the characters requier dead keys. That's what got me talking about the dead '~'.

You can edit the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose file to make the characters you need available using <Multi_key> rather than dead keys. For example, if <dead_diaeresis> <o> : "ö" odiaeresis is contained in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose, it does not mean that you can't add other entries for other key combinations to write the same character. For example, <Multi_key> <o> <colon> : "ö" odiaeresis coexists without problems in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose. I.e. if characters you need frequently have only entries which require dead keys in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose, you can add other keycombinations using <Multi_key> to type the same characters if you like. If you do that, keep a backup copy of your edited Compose file in order not to loose it when you update your system.

...

I guess I don't need the dead key combos for much, righ now. What would be nice is some kind of escape sequence which would allow me to enter <super-escape-key-combo>[unicode hex representation] and out comes the exact character I'm looking for.

That is possible with IIIMF. IIIMF is a new input mechanism, developed by Hideki Hiura and Miyashita Hisashi. I could not yet make it work on SuSE Linux, but I hope that it will work soon. IIIMF uses an extended Compose file which has several sections and one can switch between the sections using special key combinations. There are key combinations which switch to a Unicode hex input mode: <Multi_key> <u> <h> : SWITCH_STATE_TO "[ Unicode Hex ]" Ctrl<T> <u> <h> : SWITCH_STATE_TO "[ Unicode Hex ]"

...

Another option might be to hijack the number pad for custom key mappings. Have some way of putting the keyboard input mechanism into a special mode where the number pad keystrokes result in

You can also use xmodmap to map the number pad keys to anything you like. For example, if you create a ~/.Xmodmap file containing keysym KP_Enter = EuroSign keysym KP_Add = 0x01006f22 and load it with xmodmap ~/.Xmodmap you can enter the EuroSign by typing KP_Enter and the Unicode character U+6f22 (漢) by typing KP_Add. You can have several such files, ~/.Xmodmap-1 ~/.Xmodmap-2, ... and switch between them using 'xmodmap ~/.Xmodmap-1' etc. ...

...

In general a more user friendly means of learning the the compose keystrokes would be nice.

Currently you can only read /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose.

...

One thing I don't really understand is why not simply have some kind of escape key for dead keys. Or is that what [M ~] really is?

You can achieve the same result with other keys instead of dead keys in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose, if that is what you mean. You can use e.g. <Multi_key> or Ctrl<T> like in the above examples. -- Mike Fabian <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。

Steven T. Hatton

24 Sep 24 Sep

04:20

New subject: [m17n] The dead '~' is evil!

On Monday 23 September 2002 14:57, Mike Fabian wrote:

...

"Steven T. Hatton" <hattons@speakeasy.net> writes:

...
On Friday 20 September 2002 21:14, Mike Fabian wrote: I do use the compose for Á,Ó,Ú,Đ,Þ.Ö, etc. When I looked through the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose file I saw that many of the characters require dead keys. That's what got me talking about the dead '~'.

You can edit the /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose file to make the characters you need available using <Multi_key> rather than dead keys. For example, if

<dead_diaeresis> <o> : "ö" odiaeresis

is contained in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose, it does not mean that you can't add other entries for other key combinations to write the same character. For example,

<Multi_key> <o> <colon> : "ö" odiaeresis

coexists without problems in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose.

I.e. if characters you need frequently have only entries which require dead keys in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose, you can add other keycombinations using <Multi_key> to type the same characters if you like.

If you do that, keep a backup copy of your edited Compose file in order not to loose it when you update your system.

I understand that you are giving me the best 'reasonable' solution available. I appreciate the help, so please don't misunderstand what I'm about to say. [NB: all of the following can be summarized by saying: 'there are too many variables, and there is not enough orthogonality in the design.' I'm not trying to say I am unwilling to attempt the modifications suggested. I'm merely giving feedback so the process might be improved. I'm basing this on my own experience.] For a person who lives and breaths special character keyboard mappings, solutions such as "just go hack the /somewhere/over/the/rainbow/X11/keyboardprimaryalterativemodepapping.cfg" file and add '<toto>=<stupid little dog>', but remember the activation order is important..." might seem like an obvious solution. For a person whose primary interest is the morphology of reconstructed, hypothetical proto languages, such instructions can usually be followed. A problem comes about when Dr. Esoteric applies a patch a year later and his keyboard stops producing a Ç when he types <Multi_key> <C> <comma>. Or he tries to explain how to do this to a person on a mailing list, and his instructions result in that person messing up the configuration his graduate student provided for him before she switched to computer science.

...

...
I guess I don't need the dead key combos for much, right now. What would be nice is some kind of escape sequence which would allow me to enter <super-escape-key-combo>[unicode hex representation] and out comes the exact character I'm looking for.

That is possible with IIIMF. IIIMF is a new input mechanism, developed by Hideki Hiura and Miyashita Hisashi.

I could not yet make it work on SuSE Linux, but I hope that it will work soon.

IIIMF uses an extended Compose file which has several sections and one can switch between the sections using special key combinations. There are key combinations which switch to a Unicode hex input mode:

<Multi_key> <u> <h> : SWITCH_STATE_TO "[ Unicode Hex ]" Ctrl<T> <u> <h> : SWITCH_STATE_TO "[ Unicode Hex ]"

I would prefer a ~./compose/custom-1.conf ~./compose/custom-2.conf, etc. type of functionality. The idea of making modifications to system files is, as a general rule, bad. Even system wide modifications should be handled by things such as /etc/profile.local, /etc/sysconfig/some-component.conf, etc. I've spent too many hours trying to figure out why my keyboard stopped working after applying a patch. It becomes even more of a problem when some guy on the XEmacs beta list, whose display name is "If you knew my name, you'd probably hunt me down and kill me", tells you t fix it by modifying your ~/.Xmodmap, when the problem is really in the ~/.kde/share/config/kxkbrc.

...

...
Another option might be to hijack the number pad for custom key mappings. Have some way of putting the keyboard input mechanism into a special mode where the number pad keystrokes result in

You can also use xmodmap to map the number pad keys to anything you like.

For example, if you create a ~/.Xmodmap file containing

keysym KP_Enter = EuroSign keysym KP_Add = 0x01006f22

and load it with

xmodmap ~/.Xmodmap

you can enter the EuroSign by typing KP_Enter and the Unicode character U+6f22 (漢) by typing KP_Add.

You can have several such files, ~/.Xmodmap-1 ~/.Xmodmap-2, ... and switch between them using 'xmodmap ~/.Xmodmap-1' etc. ...

These kinds of solutions can be confusing because it's hard to determine if they are affecting a change in the specific shell, or in the current X session. And you also have to remember how you had configured this, and how to make it work. If you do it every day, it's not hard. If you spend a few hours setting this up the way you want it, and don't use it for several months, you may end up repeating all the learning you did in the first place.

...

...
In general a more user friendly means of learning the the compose keystrokes would be nice.

Currently you can only read /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose.

...
One thing I don't really understand is why not simply have some kind of escape key for dead keys. Or is that what [M ~] really is?

You can achieve the same result with other keys instead of dead keys in /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose, if that is what you mean. You can use e.g. <Multi_key> or Ctrl<T> like in the above examples.

IMHO a system file such as /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose should be standard across all installations. When someone says she is using: Geeko Cogwheel -> Preferences -> Peripherals -> Keyboard -> Enable keyboard layouts=t Keyboard Model = Generic 104-key PC Primary Layout = [en-US Flag] U.S. English w/ ISO9995-3 Primary Variant = basic a person on the other side of the world should be able to set his system to the same values, and achieve the same results.

8136

Age (days ago)

8142

Last active (days ago)

List overview

Download

7 comments

2 participants

participants (2)

Mike Fabian
Steven T. Hatton