What is wadokujt for? and: Some Scripts for Chinese and Japanese learning and Noxon Audio
In SuSE 9.3, there's a package wadokujt with a Japanese German dictionary. But it comes with no Program, only data is contained in the package. What programs in the SuSE distribution are thought to be used with wadokujt? KWordQuiz, KVTML: I use it together with KWordQuiz, but for this, I need to convert the data to kvtml. That's why I wrote wadokujt2kvtml.pl. There is another very similar dictionary file, named CEDICT for Chinese English translation. Unfortunately that does not come with SuSE. That file can be converted to kvtml using cedict2kvtml.pl. Noxon and Twonkyvision: All my CDs are stored in MP3 files, served via UPnP using Twonkyvision and played by a Noxon audio device. This works fine, but UTF-8 is not handled correctly. All German Umlauts and all Chinese music files are unreadable. Fortunately, on the unicode.org page, there's a file Unihan.txt, which defines the Chinese Unicode mapping, including the Mandarin PinYin translation. My script create-mapping.sh downloads this file and extracts the PinYin mapping I am interested in. The output from there is then used by utf8-to-ascii.pl, which converts all Chinese characters, German umlauts, French, Spanish, Italian accents to plain 7bit ASCII. I use this script in create-mp3-ascii-dir.sh (which is only an example and works only on my system), to create links to all my MP3 filenames, which are then 7bit ASCII and display well on Noxon. Another possible application: If you also consider the PinYin tones in utf8-to-ascii.pl (as they are considered in cedict2kvtml.pl), then you could easily build a Chinese text to speech synthesizer! - Or semi-automated translation: Translate the individual charaters to English using Unihan.txt. Is this also useful to others? Shall I make a webpage containing this information and scripts? Regards Marc
Marc Waeckerlin <Marc.Waeckerlin@siemens.com> さんは書きました:
In SuSE 9.3, there's a package wadokujt with a Japanese German dictionary. But it comes with no Program, only data is contained in the package.
What programs in the SuSE distribution are thought to be used with wadokujt?
Can be used with gjiten. Or with edict.el from (X)Emacs. I mostly use it with edict.el. -- Mike FABIAN <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。
Am Donnerstag, 19. Mai 2005 13.02 schrieb Mike FABIAN <Mike FABIAN <mfabian@suse.de>> unter "Re: [m17n] What is wadokujt for? and: Some Scripts for Chinese and Japanese learning and Noxon Audio":
Marc Waeckerlin <Marc.Waeckerlin@siemens.com> さんは書きました:
What programs in the SuSE distribution are thought to be used with wadokujt?
Can be used with gjiten. Or with edict.el from (X)Emacs. I mostly use it with edict.el.
Cool! CEDICT (Chinese English Dictionary) also works with gjiten! But: Who knows, how to display simplified Chinese and PinYin pronounciation from CEDICT within gjiten? Has anyone ever found a free downloadable Chinese German Dictionary? Regards Marc
Marc Waeckerlin <Marc.Waeckerlin@siemens.com> さんは書きました:
Cool!
CEDICT (Chinese English Dictionary) also works with gjiten!
But: Who knows, how to display simplified Chinese and PinYin pronounciation from CEDICT within gjiten?
I added a cedict package to SuSE Linux which fixes this problem. Try this: ftp://ftp.suse.com/pub/projects/m17n/9.3/noarch/cedict-20050411-0.noarch.rpm ftp://ftp.suse.com/pub/projects/m17n/9.3/src/cedict-20050411-0.src.rpm I changed the cedict_ts.u8 file in this package slightly to fix the problem you reported when using it with "Gjiten". The original file format of cedict_ts.u8 was: traditional-Chinese simplified-Chinese [pinyin] /English definition 1/English definition 2/.../ Gjiten searches for the first space ' ' in each line and assumes that everything before that first space is non-English entry word. If the next character is '[', Gjiten extracts the pronunciation information until ']', then continues to search until the first '/' which marks the start of the English translation. This works fine for the Japanese EDICT files because there is only one entry word in each line optionally followed by a pronunciation in hiragana. But in case of cedict_ts.u8, both the traditional-Chinese and the simplified-Chinese versions of each entry word are given, therefore there are two entry words in each line. Thus, Gjiten only displays the traditional-Chinese entry word, skips the simplified-Chinese entry word and the pinyin (because the next character is not '['), and displays the English translation. As a quick fix I just replaced the first space in each line of CEDICT by a double width space ' ' (U+3000 IDEOGRAPHIC SPACE). After that modification, Gjiten correctly displays the simplified Chinese and the pinyin in cedict_ts.u8 as well. -- Mike FABIAN <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。
Marc Waeckerlin <Marc.Waeckerlin@siemens.com> さんは書きました:
Cool!
CEDICT (Chinese English Dictionary) also works with gjiten!
But: Who knows, how to display simplified Chinese and PinYin pronounciation from CEDICT within gjiten?
I added a cedict package to SuSE Linux which fixes this problem. Try this: ftp://ftp.suse.com/pub/projects/m17n/9.3/noarch/cedict-20050411-0.noarch.rpm ftp://ftp.suse.com/pub/projects/m17n/9.3/src/cedict-20050411-0.src.rpm I changed the cedict_ts.u8 file in this package slightly to fix the problem you reported when using it with "Gjiten". The original file format of cedict_ts.u8 was: traditional-Chinese simplified-Chinese [pinyin] /English definition 1/English definition 2/.../ Gjiten searches for the first space ' ' in each line and assumes that everything before that first space is non-English entry word. If the next character is '[', Gjiten extracts the pronunciation information until ']', then continues to search until the first '/' which marks the start of the English translation. This works fine for the Japanese EDICT files because there is only one entry word in each line optionally followed by a pronunciation in hiragana. But in case of cedict_ts.u8, both the traditional-Chinese and the simplified-Chinese versions of each entry word are given, therefore there are two entry words in each line. Thus, Gjiten only displays the traditional-Chinese entry word, skips the simplified-Chinese entry word and the pinyin (because the next character is not '['), and displays the English translation. As a quick fix I just replaced the first space in each line of CEDICT by a double width space ' ' (U+3000 IDEOGRAPHIC SPACE). After that modification, Gjiten correctly displays the simplified Chinese and the pinyin in cedict_ts.u8 as well. -- Mike FABIAN <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。
participants (2)
-
Marc Waeckerlin
-
Mike FABIAN