Re: [m17n] Japanese, CJK and LaTeX
Ludger Sicking
I mention it's "only" an encoding problem. I want to use the "Wadalab-test.tex"-file as a base.
So I opened it in my emacs. My emacs couldnt't fontify the kanjis well. In spite of kanjis "he" displayed:
[... mojibake ...]
That's EUC-JP encoded Japanese. Although your e-mail header doesn't say so. I wonder why your Emacs doesn't display it correctly. Both GNU Emacs and XEmacs display Wadalab-test.tex correctly for me by default. You use GNU Emacs, do you? Does the Japanese in the hello page look correct to you in GNU Emacs? Try M-x view-hello-file If that doesn't look right, you probably lack the most basic Japanese fonts. Is xfntjp.rpm installed? You can switch fontsets in Emacs with "Shift+Left-mouse-button", then select "Fontset" and select one of the offered fontsets. the "standard: 16-dot medium" fontset should always work, even if your only Japanese fonts are those already included in the xf86.rpm. What the locale you are using? You can force to load a file in a specific encoding, e.g. EUC-JP with the following key combination: C-x RET c euc-jp RET C-x C-f Wadalab-test.tex RET
I input my kanji by SKK. (I didn't find an SKK package on SuSE 8.1, so I installed it by the SKK web page.... Is there such a package provided by SuSE???)
SKK is included in the XEmacs packages for SuSE Linux. I didn't yet make a SKK package to use with GNU Emacs for SuSE Linux (You are the first one who asks ...) Personally I use XEmacs with the native Canna interface to input Japanese, for GNU Emacs there is the tamago.rpm package which offers a nice, direct interface to Canna. I think using Canna is easier than SKK, but the choice of the input method really is a personal preference.
And I saved the file in ISO-2022-JP-2.
Don't do that. Save it as euc-jp. If you want to use the Wadalab PostScript fonts, you *must* save it as EUC-JP. In the Wadalab-test.tex file you have for example: \begin{CJK*}[dnp]{JIS}{min} And the documentation /usr/share/doc/packages/cjk-latex/doc/CJK.doc clearly says: CJK.doc> \begin{CJK*}[<fontencoding>]{<encoding>}{<family>} CJK.doc> ... CJK.doc> \end{CJK*} CJK.doc> CJK.doc> are defined. The parameters have the following meaning: CJK.doc> CJK.doc> <encoding> These character sets resp. encodings are currently CJK.doc> implemented in CJK.enc: CJK.doc> CJK.doc> [...] CJK.doc> JIS (For Japanese. CJK.doc> Character set: JIS X 0208:1997. CJK.doc> Encoding: EUC.) You see, you *must* use EUC-JP encoding, if you use the {JIS} parameter in the \begin{CJK*} command.
I processed it with latex (and sjislatex)
Forget sjislatex unless you use SJIS.
but there was the following message:
the latex command: ! Text line contains an invalid character. l.20 light: {\fontseries{l}\selectfont ^^[ $BC]Fb^^[(B}\ normal: ^^[$BC]Fb^^[...
[...]
So there is a problem with the encoding....
Yes. You get funny error messages like that when you don't use the correct encoding. Use EUC-JP together with \begin{CJK*}[dnp]{JIS}{min} and all is well.
BTW: the latex command on the example files worked well. The output was fine and I can see the kanjis. But I don't want to present a "hello world" as my titlepage for my diploma thesis... ;-)
I didnt find an encoding like UTF8 for my emacs... (ok, it's my fault. it's of course not the emacs given by SuSE 8.1... it has that encoding possibility....)
If you use the GNU emacs on SuSE 8.1, it already has the coding system
utf-8, but only for a very limited subset of Unicode. This does not
cover Japanese. To be able to use Japanese as well in UTF-8, you need
to install the Mule-UCS.rpm package, which is an extension for GNU
Emacs for better Unicode coverage.
The XEmacs package on SuSE Linux 8.1 already includes Mule-UCS, but
you may need to activate it in your XEmacs profile (which is
~/.xemacs/init.el). Add the following line:
(if (locate-library "un-define") (require 'un-define))
If you use the default ~/.xemacs/init.el file as distributed with SuSE
Linux 8.1, you already have that.
--
Mike Fabian
Hi folks, thank you for your answers.... (ok, thanks Mike ;-) Could anyone explain me WHY I have to save my file in EUC-JP encoding although the preamble of the CJK-environment says \begin{CJK}{JIS}{komi} ?? ***** I would like to write \begin{CJK}{EUC}{komi} or something like that. I don't understand the whole stuff by encoding.... ;-( For Mike.... On Wed, 12 Feb 2003, Mike FABIAN wrote:
I wonder why your Emacs doesn't display it correctly. Both GNU Emacs and XEmacs display Wadalab-test.tex correctly for me by default. You use GNU Emacs, do you? Yes, I do. GNU Emacs...
Does the Japanese in the hello page look correct to you in GNU Emacs? Yes, it does. All fonts are installed correctly..
You can force to load a file in a specific encoding, e.g. EUC-JP with the following key combination:
C-x RET c euc-jp RET C-x C-f Wadalab-test.tex RET Nice hint...
I input my kanji by SKK. (I didn't find an SKK package on SuSE 8.1, so I installed it by the SKK web page.... Is there such a package provided by SuSE???)
SKK is included in the XEmacs packages for SuSE Linux. I didn't yet make a SKK package to use with GNU Emacs for SuSE Linux (You are the first one who asks ...) I know that SKK is included in XEmacs. But my XEmacs doesn't start: ludger@garfunkel:~/tex/japanese/examples> xemacs Warning: Missing charsets in String to FontSet conversion Warning: Cannot convert string "-gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1,-*-*-medium-r-normal--16-*-*-*-c-*-*-*" to type FontSet Warning: Missing charsets in String to FontSet conversion Warning: Unable to load any usable fontset Warning: Missing charsets in String to FontSet conversion Warning: Unable to load any usable fontset
Fatal error (11).
Your files have been auto-saved.
Use `M-x recover-session' to recover them.
If you have access to the PROBLEMS file that came with your
version of XEmacs, please check to see if your crash is described
there, as there may be a workaround available.
Otherwise, please report this bug by running the send-pr
script included with XEmacs, or selecting `Send Bug Report'
from the help menu.
As a last resort send ordinary email to `crashes@xemacs.org'.
*MAKE SURE* to include the information in the command
M-x describe-installation.
If at all possible, *please* try to obtain a C stack backtrace;
it will help us immensely in determining what went wrong.
To do this, locate the core file that was produced as a result
of this crash (it's usually called `core' and is located in the
directory in which you started the editor, or maybe in your home
directory), and type
gdb /usr/X11R6/bin/xemacs core
then type `where' when the debugger prompt comes up.
(If you don't have GDB on your system, you might have DBX,
or XDB, or SDB. A similar procedure should work for all of
these. Ask your system administrator if you need more help.)
Lisp backtrace follows:
# bind (frame-being-created)
make-frame(nil #
Personally I use XEmacs with the native Canna interface to input Japanese, for GNU Emacs there is the tamago.rpm package which offers a nice, direct interface to Canna. I think using Canna is easier than SKK, but the choice of the input method really is a personal preference. Ok, I don't want to learn a new input method. But are there any short and readable introduction to use Canna. Also I want to have an input method that doesn't use any server. Like SKK where the dict is read in in a buffer....
And I saved the file in ISO-2022-JP-2.
Don't do that. Save it as euc-jp. If you want to use the Wadalab PostScript fonts, you *must* save it as EUC-JP. In the Wadalab-test.tex file you have for example: Ok, I did it and it worked... But why?? (see my question above...)
/usr/share/doc/packages/cjk-latex/doc/CJK.doc
clearly says:
CJK.doc> \begin{CJK*}[<fontencoding>]{<encoding>}{<family>} CJK.doc> ... CJK.doc> \end{CJK*} CJK.doc> CJK.doc> are defined. The parameters have the following meaning: CJK.doc> CJK.doc> <encoding> These character sets resp. encodings are currently CJK.doc> implemented in CJK.enc: CJK.doc> CJK.doc> [...] CJK.doc> JIS (For Japanese. CJK.doc> Character set: JIS X 0208:1997. CJK.doc> Encoding: EUC.)
You see, you *must* use EUC-JP encoding, if you use the {JIS} parameter in the \begin{CJK*} command.
Ok, but why? In my opinion it should be \begin{CJK}{EUC}{komi}!!!!! ***** Why this confusion?
Yes. You get funny error messages like that when you don't use the correct encoding. Use EUC-JP together with \begin{CJK*}[dnp]{JIS}{min} and all is well. You are right.
If you use the GNU emacs on SuSE 8.1, it already has the coding system utf-8, but only for a very limited subset of Unicode. This does not As I wrote I don't use the GNU Emacs provided by SuSE 8.1. So I cant refer to that....
The XEmacs package on SuSE Linux 8.1 already includes Mule-UCS, but see above. My XEmacs doesn't start....
Thank you, Mike for your answers... Best regards, Ludger
Ludger Sicking
SKK is included in the XEmacs packages for SuSE Linux. I didn't yet make a SKK package to use with GNU Emacs for SuSE Linux (You are the first one who asks ...) I know that SKK is included in XEmacs. But my XEmacs doesn't start:
XEmacs from the SuSE XEmacs package?
ludger@garfunkel:~/tex/japanese/examples> xemacs Warning: Missing charsets in String to FontSet conversion Warning: Cannot convert string "-gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1,-*-*-medium-r-normal--16-*-*-*-c-*-*-*" to type FontSet Warning: Missing charsets in String to FontSet conversion Warning: Unable to load any usable fontset Warning: Missing charsets in String to FontSet conversion Warning: Unable to load any usable fontset
Fatal error (11).
Your files have been auto-saved. Use `M-x recover-session' to recover them.
This fontset should always work. The gnu-unifont is always installed because it is required by YaST2. And -*-*-medium-r-normal--16-*-*-*-c-*-*-* matches almost every possible encoding, even if you have only the most basic fonts installed. If that fontset doesn't work, something is very strange with your font setup. Does xfd -fn -gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1 show the GNU Unifont? Which locale do you use when starting XEmacs (output of the 'locale' command)? [...]
Ok, I don't want to learn a new input method. But are there any short and readable introduction to use Canna. Also I want to have an input method that doesn't use any server. Like SKK where the dict is read in in a buffer....
GNU Emacs has a built in imput method as well. M-x set-input-method RET japanese RET Basically this *is* SKK but with a slightly changed (dumbed down?) user interface. I think the original SKK is better. It is maybe a little bit more difficult to learn though.
And I saved the file in ISO-2022-JP-2.
Don't do that. Save it as euc-jp. If you want to use the Wadalab PostScript fonts, you *must* save it as EUC-JP. In the Wadalab-test.tex file you have for example: Ok, I did it and it worked... But why?? (see my question above...)
/usr/share/doc/packages/cjk-latex/doc/CJK.doc
clearly says:
CJK.doc> \begin{CJK*}[<fontencoding>]{<encoding>}{<family>} CJK.doc> ... CJK.doc> \end{CJK*} CJK.doc> CJK.doc> are defined. The parameters have the following meaning: CJK.doc> CJK.doc> <encoding> These character sets resp. encodings are currently CJK.doc> implemented in CJK.enc: CJK.doc> CJK.doc> [...] CJK.doc> JIS (For Japanese. CJK.doc> Character set: JIS X 0208:1997. CJK.doc> Encoding: EUC.)
You see, you *must* use EUC-JP encoding, if you use the {JIS} parameter in the \begin{CJK*} command.
Ok, but why? In my opinion it should be \begin{CJK}{EUC}{komi}!!!!! *****
Why this confusion?
There is "JIS" and "JIS2". These are different characters sets but both are EUC encoded: JIS (For Japanese. Character set: JIS X 0208:1997. Encoding: EUC.) JIS2 (Japanese supplementary character set, Character set: JIS X 0212-1990. Encoding: EUC.) When using SJIS encoding, you can only use the characters from JIS X 0208:1997, not those from JIS X 0212-1990: SJIS (For Japanese. Used mainly on PCs. Also known as `MS Kanji'. Character sets: 1-byte characters from JIS X 0201-1997 (half-width katakana), 2-byte characters from JIS X 0208:1997. Encoding: SJIS.) In EUC encoding you can use both.
Yes. You get funny error messages like that when you don't use the correct encoding. Use EUC-JP together with \begin{CJK*}[dnp]{JIS}{min} and all is well. You are right.
If you use the GNU emacs on SuSE 8.1, it already has the coding system utf-8, but only for a very limited subset of Unicode. This does not As I wrote I don't use the GNU Emacs provided by SuSE 8.1. So I cant refer to that....
If you installed GNU Emacs yourself, you probably know enough about
Emacs to install Mule-UCS yourself as well. If you want to do Japanese
in UTF-8 with Emacs you need it.
You can probably also just install the Mule-UCS.rpm from SuSE Linux
8.1, it installs the lisp files to
/usr/share/emacs/site-lisp/Mule-UCS/
and a startup file to
/usr/share/emacs/site-lisp/suse-start-Mule-UCS.el
Then it should be enough to load this startup file in your ~/.emacs,
e.g. like this
(load "/usr/share/emacs/site-lisp/suse-start-Mule-UCS.el)
This will add the directory /usr/share/emacs/site-lisp/Mule-UCS to the
load-path of your Emacs, then load the initialization code for
Mule-UCS and create a few more useful fontsets.
--
Mike Fabian
Hi folks, it is still a problem.... I don't understand the using of encoding in CJK and Emacs itself. If I open my file in the japanese-language environment the kanjis are displayed correctly. By default the language environment is english. (you can change it by the Mule-menu or the M-x set-language-environment <TAB><TAB>j <TAB>... command.)
I know that SKK is included in XEmacs. But my XEmacs doesn't start:
XEmacs from the SuSE XEmacs package?
Yes. It was (in this poiint ;-) the XEmacs provided by SuSE 8.1 !!
setup. Does
xfd -fn -gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1
show the GNU Unifont?
Which locale do you use when starting XEmacs (output of the 'locale' command)?
ludger@garfunkel:~/private/Diplom/Kanji/Kanji> xfd -fn -gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1 Warning: Cannot convert string "-gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1" to type FontStruct xfd: no font to display ludger@garfunkel:~/private/Diplom/Kanji/Kanji> ludger@garfunkel:~/private/Diplom/Kanji/Kanji> locale LANG=de_DE@euro LC_CTYPE="de_DE@euro" LC_NUMERIC="de_DE@euro" LC_TIME="de_DE@euro" LC_COLLATE=POSIX LC_MONETARY="de_DE@euro" LC_MESSAGES="de_DE@euro" LC_PAPER="de_DE@euro" LC_NAME="de_DE@euro" LC_ADDRESS="de_DE@euro" LC_TELEPHONE="de_DE@euro" LC_MEASUREMENT="de_DE@euro" LC_IDENTIFICATION="de_DE@euro" LC_ALL= ludger@garfunkel:~/private/Diplom/Kanji/Kanji>
GNU Emacs has a built in imput method as well.
M-x set-input-method RET japanese RET
Basically this *is* SKK but with a slightly changed (dumbed down?) I realized it by your hint. Thanks. It worked. ;-)
CJK.doc \begin{CJK*}[<fontencoding>]{<encoding>}{<family>} CJK.doc> ... CJK.doc> \end{CJK*} What is meant by <encoding> in this declaration? I understand it as follows:
My problem (to understand it...) is the same: the <encoding> argument is the encoding in which the font is given. The encoding in which one saves the file could be anything (in the beginning) But to let CJK work as you wish you have to save it in the EUC-JP encoding. Right? Or is there still a lack of understanding? This explanation makes clear that there isn't a command like
\begin{CJK}{EUC}{komi}!!!!!
Right? This was the reason of all confusion. Because of the declaration \begin{CJK*}[dnp]{JIS}{min} I thought I have to save my tex-file JIS encoded. But that doesn't work (as you, Mike, said correctly...)
If you installed GNU Emacs yourself, you probably know enough about Emacs to install Mule-UCS yourself as well. If you want to do Japanese in UTF-8 with Emacs you need it.
What does this package provide? Why should I install it? (if it is useful, of course I will do.... ;-) Thank you for all your answers. My titlepage and my table of contents are nice. You can have a look at it (two PostScript-files or JPEG-files) at http://www.muenster.de/~lsicking/Diploma/ Best regards, Ludger
Ludger Sicking
ludger@garfunkel:~/private/Diplom/Kanji/Kanji> xfd -fn -gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1 Warning: Cannot convert string "-gnu-unifont-medium-r-normal--16-160-75-75-p-80-iso10646-1" to type FontStruct xfd: no font to display
This font is part of yast2-qt.rpm:
mfabian@gregory:~$ rpm -qf /usr/X11R6/lib/X11/fonts/uni/newunifont.pcf.gz
yast2-qt-2.6.23-16
mfabian@gregory:~$
"xset q" should tell you, whether you have the directory
/usr/X11R6/lib/X11/fonts/uni/ in your font path. If it's not in your
font path, add it to /etc/X11/XF86Config.
--
Mike Fabian
Ludger Sicking
it is still a problem.... I don't understand the using of encoding in CJK and Emacs itself.
If I open my file in the japanese-language environment the kanjis are displayed correctly. By default the language environment is english. (you can change it by the Mule-menu or the M-x set-language-environment <TAB><TAB>j <TAB>... command.)
The locale you use when starting (X)Emacs and the language environment
you choose have an influence on the priorities of the automatic
detection of encodings.
To make sure that (X)Emacs always reads a certain file in the correct
encoding, you can add "coding system cookies" to your file. See the
info pages of your favorite Emacs variant.
--
Mike Fabian
Ludger Sicking
If you installed GNU Emacs yourself, you probably know enough about Emacs to install Mule-UCS yourself as well. If you want to do Japanese in UTF-8 with Emacs you need it.
What does this package provide? Why should I install it? (if it is useful, of course I will do.... ;-)
As I wrote, you need if you want to use UTF-8 encoding in (X)Emacs.
In Emacs you can use a small subset of UTF-8 already without Mule-UCS
(mainly characters used in European languages), but for CJK languages
in UTF-8 you need Mule-UCS. In XEmacs, you always need Mule-UCS to
use UTF-8 encoding, even if you only use European languages.
Why use UTF-8 encoding?
Because it enables you to use the same encoding for many languages.
I run my computers in ja_JP.UTF-8 locale, it nice to be able to view
Files containing Japanese and German Umlauts simply with less in a
terminal like mlterm or xiterm and have everything display correctly.
It's far more comfortable than always switching locales if you use
more than one non-English language.
--
Mike Fabian
participants (3)
-
Ludger Sicking
-
Ludger Sicking
-
Mike FABIAN