Re: [SLE] character set converter in SuSE?
George Zeigler <genz1968@mtu-net.ru> writes:
recode has no GUI :-(
Doing a GUI for `recode' might be a major undertaking, given the number of possibilities and options... However, I'm starting to ponder the possibility, now that I reasonably learned Python, and intend to start studying GTK/pygtk more seriously in a few weeks. `recode' could surely be a good exercise, given I've practically no experience in GUI writing... I wonder and guess. The interface would be that if `recode' is called without any arguments, then it would launch the GUI? Maybe not the most efficient way to do it (as the GUI has to be a separate program), but probably the simplest, as I definitely want to keep `recode' unchanged, as a name for the batch program. Of course, do not hold your breath waiting for a GUI! :-)
I received a cyrillic text which I can't figure out the character set. It's not KOI8 or CP1251. And recode does not guess.
Automatically guessing character sets is surely not an easy challenge, especially when the character set has little redundancy, which is the usual case for narrow (8-bit) charsets. And resorting to recognise the natural language of the text is yet another whole level of difficulty. I'm definitely not competent enough for tackling any of these problems. One can never be sure, as users sometimes contribute surprising resources!
-k does not seem to apply to me.
It might still be your best bet. You have to learn to use it. Of course, if the charset you are trying to identify is unknown to `recode', it has no chance of finding it. Experience with other users taught me that, usually, the charset is there, and `recode' might find it, if you hint it enough.
The documentation is intense, and after skimming through it, I'm still not sure how the command lines work.
Was the small tutorial chapter written, for the version you are using? It might help. About the "intensity" of the documentation, you understand that making it more "fluid" would also require making it much longer, and it is not short already. And there are many new features planned for the next releases, which will require documentation as well. Sigh!
I just want to convert from one character set to another. Say from ISO-8859-5 to KOI-8 . I think "recode cyrillic..GOST_19768-74 old.rtf" is correct. recode doesn't seem to create a new file, for instance "recode cyrillic..GOST_19768-74 old.rtf new.rtf" is what I would like, so as not to overwrite the file. But this did not work.
The short tutorial explains how to do this with examples, and the full documentation also says it more formally. Just use: recode cyrillic..GOST_19768-74 <old.rtf >new.rtf Notice that charset names, before and after `..', may be abbreviated, as long as you do not introduce ambiguities. You may use only lower case if you want, and omit non alpha-numerics. Try some, `recode' will tell you if you made them too short. `recode' calls can often be written tersely.
The list of character sets and their aliasis makes for tough reading. [...] Table format with a list of the character sets with alias in a columns next to them would make for easier reading.
This would be at the expense of some vertical space in the manual, but it might be worth it, yes. Thanks for your suggestions. Let me write again to you when I'll revisit your ideas for real work, yet probably not soon... -- François Pinard http://www.iro.umontreal.ca/~pinard -- To unsubscribe send e-mail to suse-linux-e-unsubscribe@suse.com For additional commands send e-mail to suse-linux-e-help@suse.com Also check the FAQ at http://www.suse.com/Support/Doku/FAQ/
participants (1)
-
pinard@iro.umontreal.ca