What is wadokujt for? and: Some Scripts for Chinese and Japanese learning and Noxon Audio
In SuSE 9.3, there's a package wadokujt with a Japanese German dictionary. But it comes with no Program, only data is contained in the package. What programs in the SuSE distribution are thought to be used with wadokujt? KWordQuiz, KVTML: I use it together with KWordQuiz, but for this, I need to convert the data to kvtml. That's why I wrote wadokujt2kvtml.pl. There is another very similar dictionary file, named CEDICT for Chinese English translation. Unfortunately that does not come with SuSE. That file can be converted to kvtml using cedict2kvtml.pl. Noxon and Twonkyvision: All my CDs are stored in MP3 files, served via UPnP using Twonkyvision and played by a Noxon audio device. This works fine, but UTF-8 is not handled correctly. All German Umlauts and all Chinese music files are unreadable. Fortunately, on the unicode.org page, there's a file Unihan.txt, which defines the Chinese Unicode mapping, including the Mandarin PinYin translation. My script create-mapping.sh downloads this file and extracts the PinYin mapping I am interested in. The output from there is then used by utf8-to-ascii.pl, which converts all Chinese characters, German umlauts, French, Spanish, Italian accents to plain 7bit ASCII. I use this script in create-mp3-ascii-dir.sh (which is only an example and works only on my system), to create links to all my MP3 filenames, which are then 7bit ASCII and display well on Noxon. Another possible application: If you also consider the PinYin tones in utf8-to-ascii.pl (as they are considered in cedict2kvtml.pl), then you could easily build a Chinese text to speech synthesizer! - Or semi-automated translation: Translate the individual charaters to English using Unihan.txt. Is this also useful to others? Shall I make a webpage containing this information and scripts? Regards Marc
Marc Waeckerlin
In SuSE 9.3, there's a package wadokujt with a Japanese German dictionary. But it comes with no Program, only data is contained in the package.
What programs in the SuSE distribution are thought to be used with wadokujt?
Can be used with gjiten. Or with edict.el from (X)Emacs. I mostly use
it with edict.el.
--
Mike FABIAN
Am Donnerstag, 19. Mai 2005 13.02 schrieb Mike FABIAN
Marc Waeckerlin
さんは書きました: What programs in the SuSE distribution are thought to be used with wadokujt?
Can be used with gjiten. Or with edict.el from (X)Emacs. I mostly use it with edict.el.
Cool! CEDICT (Chinese English Dictionary) also works with gjiten! But: Who knows, how to display simplified Chinese and PinYin pronounciation from CEDICT within gjiten? Has anyone ever found a free downloadable Chinese German Dictionary? Regards Marc
Marc Waeckerlin
Cool!
CEDICT (Chinese English Dictionary) also works with gjiten!
But: Who knows, how to display simplified Chinese and PinYin pronounciation from CEDICT within gjiten?
I added a cedict package to SuSE Linux which fixes this problem.
Try this:
ftp://ftp.suse.com/pub/projects/m17n/9.3/noarch/cedict-20050411-0.noarch.rpm
ftp://ftp.suse.com/pub/projects/m17n/9.3/src/cedict-20050411-0.src.rpm
I changed the cedict_ts.u8 file in this package slightly to fix the
problem you reported when using it with "Gjiten".
The original file format of cedict_ts.u8 was:
traditional-Chinese simplified-Chinese [pinyin] /English definition 1/English definition 2/.../
Gjiten searches for the first space ' ' in each line and assumes that
everything before that first space is non-English entry word.
If the next character is '[', Gjiten extracts the pronunciation information
until ']', then continues to search until the first '/' which marks
the start of the English translation.
This works fine for the Japanese EDICT files because there is only one
entry word in each line optionally followed by a pronunciation in
hiragana.
But in case of cedict_ts.u8, both the traditional-Chinese and the
simplified-Chinese versions of each entry word are given, therefore
there are two entry words in each line. Thus, Gjiten only displays the
traditional-Chinese entry word, skips the simplified-Chinese entry
word and the pinyin (because the next character is not '['), and
displays the English translation.
As a quick fix I just replaced the first space in each line of CEDICT
by a double width space ' ' (U+3000 IDEOGRAPHIC SPACE). After that
modification, Gjiten correctly displays the simplified Chinese and the
pinyin in cedict_ts.u8 as well.
--
Mike FABIAN
Marc Waeckerlin
Cool!
CEDICT (Chinese English Dictionary) also works with gjiten!
But: Who knows, how to display simplified Chinese and PinYin pronounciation from CEDICT within gjiten?
I added a cedict package to SuSE Linux which fixes this problem.
Try this:
ftp://ftp.suse.com/pub/projects/m17n/9.3/noarch/cedict-20050411-0.noarch.rpm
ftp://ftp.suse.com/pub/projects/m17n/9.3/src/cedict-20050411-0.src.rpm
I changed the cedict_ts.u8 file in this package slightly to fix the
problem you reported when using it with "Gjiten".
The original file format of cedict_ts.u8 was:
traditional-Chinese simplified-Chinese [pinyin] /English definition 1/English definition 2/.../
Gjiten searches for the first space ' ' in each line and assumes that
everything before that first space is non-English entry word.
If the next character is '[', Gjiten extracts the pronunciation information
until ']', then continues to search until the first '/' which marks
the start of the English translation.
This works fine for the Japanese EDICT files because there is only one
entry word in each line optionally followed by a pronunciation in
hiragana.
But in case of cedict_ts.u8, both the traditional-Chinese and the
simplified-Chinese versions of each entry word are given, therefore
there are two entry words in each line. Thus, Gjiten only displays the
traditional-Chinese entry word, skips the simplified-Chinese entry
word and the pinyin (because the next character is not '['), and
displays the English translation.
As a quick fix I just replaced the first space in each line of CEDICT
by a double width space ' ' (U+3000 IDEOGRAPHIC SPACE). After that
modification, Gjiten correctly displays the simplified Chinese and the
pinyin in cedict_ts.u8 as well.
--
Mike FABIAN
participants (2)
-
Marc Waeckerlin
-
Mike FABIAN