Re: [SLE] character set converter in SuSE?

25 Feb 2000

      George Zeigler <genz1968@mtu-net.ru> writes:
...
recode has no GUI :-(
Doing a GUI for `recode' might be a major undertaking, given the number
of possibilities and options...  However, I'm starting to ponder the
possibility, now that I reasonably learned Python, and intend to start
studying GTK/pygtk more seriously in a few weeks. `recode' could surely
be a good exercise, given I've practically no experience in GUI writing...

I wonder and guess.  The interface would be that if `recode' is called
without any arguments, then it would launch the GUI?  Maybe not the most
efficient way to do it (as the GUI has to be a separate program), but
probably the simplest, as I definitely want to keep `recode' unchanged,
as a name for the batch program.

Of course, do not hold your breath waiting for a GUI! :-)
...
I received a cyrillic text which I can't figure out the character set.
It's not KOI8 or CP1251.  And recode does not guess.
Automatically guessing character sets is surely not an easy challenge,
especially when the character set has little redundancy, which is the
usual case for narrow (8-bit) charsets.  And resorting to recognise the
natural language of the text is yet another whole level of difficulty.
I'm definitely not competent enough for tackling any of these problems.
One can never be sure, as users sometimes contribute surprising resources!
...
-k does not seem to apply to me.
It might still be your best bet.  You have to learn to use it.  Of course,
if the charset you are trying to identify is unknown to `recode', it has no
chance of finding it.  Experience with other users taught me that, usually,
the charset is there, and `recode' might find it, if you hint it enough.
...
The documentation is intense, and after skimming through it, I'm still
not sure how the command lines work.
Was the small tutorial chapter written, for the version you are using?
It might help.  About the "intensity" of the documentation, you understand
that making it more "fluid" would also require making it much longer,
and it is not short already.  And there are many new features planned for
the next releases, which will require documentation as well.  Sigh!
...
I just want to convert from one character set to another.  Say from
ISO-8859-5 to KOI-8 . I think "recode cyrillic..GOST_19768-74 old.rtf"
is correct.  recode doesn't seem to create a new file, for instance
"recode cyrillic..GOST_19768-74 old.rtf new.rtf" is what I would like,
so as not to overwrite the file.  But this did not work.
The short tutorial explains how to do this with examples, and the full
documentation also says it more formally.  Just use:

   recode cyrillic..GOST_19768-74 <old.rtf >new.rtf

Notice that charset names, before and after `..', may be abbreviated, as
long as you do not introduce ambiguities.  You may use only lower case if
you want, and omit non alpha-numerics.  Try some, `recode' will tell you
if you made them too short.  `recode' calls can often be written tersely.
...
The list of character sets and their aliasis makes for tough reading.
[...]  Table format with a list of the character sets with alias in a
columns next to them would make for easier reading.
This would be at the expense of some vertical space in the manual, but it
might be worth it, yes.  Thanks for your suggestions.  Let me write again
to you when I'll revisit your ideas for real work, yet probably not soon...

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard

-- 
To unsubscribe send e-mail to suse-linux-e-unsubscribe@suse.com
For additional commands send e-mail to suse-linux-e-help@suse.com             
Also check the FAQ at http://www.suse.com/Support/Doku/FAQ/

pinard＠iro.umontreal.ca

tags

participants (1)