* Mike FABIAN <mfabian@suse.de> [031209 18:12]:
Philip Amadeo Saeli <psaeli@zorodyne.com> さんは書きました:
* Mike FABIAN <mfabian@suse.de> [031208 15:45]:
Philip Amadeo Saeli <psaeli@zorodyne.com> さんは書きました:
I am having little problem with the Chinese character input. The problem is including the tone marks required by the pinyin transcription. I've tried wierd latin vowels, but have not been able to find a complete set necessary. I've been using "Insert -> Special Character" in OpenOffice.
Which characters do you need for PinYin?
Specifically, I needed the vowels [aeiou] with the first and third tone marks above them, which are basically a dash and an upside-down caret respectively. The vowels with second and fourth tone marks could be represented by existing latin-1 chars.
I.e. you need:
U+01CD: LATIN CAPITAL LETTER A WITH CARON U+01CE: LATIN SMALL LETTER A WITH CARON U+01CF: LATIN CAPITAL LETTER I WITH CARON U+01D0: LATIN SMALL LETTER I WITH CARON U+01D1: LATIN CAPITAL LETTER O WITH CARON U+01D2: LATIN SMALL LETTER O WITH CARON U+01D3: LATIN CAPITAL LETTER U WITH CARON U+01D4: LATIN SMALL LETTER U WITH CARON U+011A: LATIN CAPITAL LETTER E WITH CARON U+011B: LATIN SMALL LETTER E WITH CARON
U+0100: LATIN CAPITAL LETTER A WITH MACRON U+0101: LATIN SMALL LETTER A WITH MACRON U+012A: LATIN CAPITAL LETTER I WITH MACRON U+012B: LATIN SMALL LETTER I WITH MACRON U+014C: LATIN CAPITAL LETTER O WITH MACRON U+014D: LATIN SMALL LETTER O WITH MACRON U+016A: LATIN CAPITAL LETTER U WITH MACRON U+016B: LATIN SMALL LETTER U WITH MACRON U+0112: LATIN CAPITAL LETTER E WITH MACRON U+0113: LATIN SMALL LETTER E WITH MACRON
is that right?
I forgot to mention the versions for the letter 'u' with diaeresis: U+01D5: LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON U+01D6: LATIN SMALL LETTER U WITH DIAERESIS AND MACRON U+01D9: LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON U+01DA: LATIN SMALL LETTER U WITH DIAERESIS AND CARON
I think it is quite easy to add a new input method to SCIM which enables you to input these characters. That would probably be the most efficient way for you to input these characters as you are using SCIM anyway. I have to update the SuSE SCIM packages anyway, I'll have a look whether I can add something like that.
WRT entering the pinyin phonetics with tone marks, I finally decided that the most efficient way for me to finish my paper would be to make two passes: one to enter the Latin chars and another to enter the Han chars. I could use the compose key for the former and then switch locales to activate SCIM for the latter. It -would- be nice, esp for editing a complete document, to be able to use the same input method for -all- input, though. BTW, I looked up the compose sequences in the file: /usr/X11R6/lib/X11/locale/en_US.UTF-8/Compose and all -except- the "U+00F3: LATIN SMALL LETTER O WITH ACUTE" work fine. Interestingly, "U+00D3: LATIN CAPITAL LETTER O WITH ACUTE" works just fine! Note that I'm using a US keyboard and so am using the "Multi_key" (compose) sequences exclusively, having no dead keys.
I finally found several fonts (in addition to the Arphic GB TTF fonts) that included the needed chars, which solved part of my problem. The Arphic font I was using ("AR PL KaitiM GB", in OpenOffice) had the needed chars, but they were double wide and hence unsuitable, the rest of the base Latin chars being standard width.
For anyone who is interested, the fonts which included the needed chars were (names from the OpenOffice font selection menu):
"AR PL KaitiM GB" (full char cell width) "AR PL SungtiL GB" (full char cell width) "Caslon" "Caslon RomanSmallcaps" "Courier New" "Gentium" "Gentium Alt" "New Century Schoolbook" "Times New Roman"
"Courier New", "Times New Roman" are commercial fonts (from the Microsoft "Webfonts")
"New Century Schoolbook" is one of the classic X11 Bitmap fonts, you probably don't want to use a bitmap font in OpenOffice.
Yup! Agree.
"Gentium" is not distributable without asking the author. I wrote an e-mail to the author asking whether it is OK to distribute "Gentium" with SuSE Linux but never received an reply, therfore this font isn't included with SuSE Linux.
This one has been my favorite, with the MS "Times New Roman" coming in second.
"Caslon" is free and included with SuSE Linux.
But "FreeSans" might be better, a serif version ("FreeSerif") and a monospaced version "FreeMono" are available as well. All of them in regular, bold, oblique, and bold-oblique. And apparently these fonts are actively developed, if you miss some characters, I suggest to ask the author to add it. The home page of the freefont project is:
Thanks for the font info. I always have trouble mapping font names as they appear in applications to the actual files in the X11 fonts dirs and then to their respective RPM files and being able to find out their sources, licensing, etc. BTW, I decided to install the freefont RPM from SuSE 9.0 and give it a try. Indeed it does have all the necessary chars. I don't particularly like the tall vertical line spacing, though. Makes things look double spaced (or, at least 1.5 spaced) though they are single spaced. It -is- a very sharp looking font, though. BTW, how is m17n support in SuSE 9.0 as compared to 8.2? I added quite a number of RPMs from the SuSE ftp site (/pub/people/mfabian/8.2/) to my 8.2 system to get it up to snuff and I notice that there are no packages for 9.0 there. Does that indicate that they are not needed or simply that none have been prepared yet. Thanks again for all the useful info! I've just recently switched over to a UTF-8 locale and have been coming up to speed with Unicode as well. I'm encouraged by the way things seem to be coming together now for multilingual support under Linux, esp support for multiple languages in a single document. Some Chinese people who saw a sample of my paper were impressed that it had been done without expensive payware! Phil -- Philip Amadeo Saeli SuSE Linux 8.2 psaeli@zorodyne.com