grep(ing) Japanese text...
Hello all, I'm having a problem doing a grep of an html file containing Japanese text. I'm using rxvt (tried kterm as well) as my terminal, and greping Japanese text in plain text files works fine, but I have some html files that doesn't grep properly. It seems the Japanese text is not visable (or is temporarily corrupted) by grep in my html file(s) as evidenced by the following. Under rxvt: $ grep info.html * menu.html:href="info.html">����</A></P></DIV> Under kterm: $ grep info.html * menu.html:href="info.html">/A></P></DIV> The actual line looks like this: href="info.html">こっち</A></P></DIV> This brings up nothing: $ grep こっち * $ Anyone tackle this one yet? Thanks for reading. Eric __________________________________________________ Do You Yahoo!? Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/
ピアス エリック <eric_karatsujp@yahoo.co.jp> writes: [...]
I guess the Japanese in your file is iso-2022-jp encoded or SJIS encoded. 'grep' will interpret the expression you enter in the encoding of the locale you are using, which is probably EUC-JP. If you are running in ja_JP.eucJP locale and want to grep for Japanese in files which may have EUC-JP, ISO-2022-JP or SJIS encoding, you can use lgrep: mfabian@gregory:/tmp$ locale charmap EUC-JP mfabian@gregory:/tmp$ lgrep "こ+っち" ttt*html ttt-euc-jp.html:test ここっち test ttt-iso-2022-jp.html:test ここっち test ttt-sjis.html:test ここっち test mfabian@gregory:/tmp$ lgrep will check for all these Japanese encodings. lgrep is a hard link to lv, i.e. you need to have lv.rpm installed. -- Mike Fabian <mfabian@suse.de> http://www.suse.de/~mfabian 睡眠不足はいい仕事の敵だ。
That works great. Too bad there isn't a '-r' (recursive) option for lgrep. But thanks all the same! Eric --- Mike Fabian <mfabian@suse.de> からのメッセージ:
__________________________________________________ Do You Yahoo!? Yahoo! BB is Broadband by Yahoo! http://bb.yahoo.co.jp/
At Tue, 21 May 2002 11:49:21 +0900 (JST), ピアス エリック wrote:
That works great. Too bad there isn't a '-r' (recursive) option for lgrep.
you can combine with find (yeah it's linux :) % find . | xargs lgrep pattern -- Takashi Iwai <tiwai@suse.de> SuSE GmbH - www.suse.de ALSA Developer ALSA Project - www.alsa-project.org
participants (3)
-
Mike Fabian
-
Takashi Iwai
-
ピアス エリック