Re: [opensuse] uncompessing zip files and accented characters

13 Jul 2010

      Per Jessen wrote:
...
Philipp Thomas wrote:
...
* Dave Howorth (dhoworth@mrc-lmb.cam.ac.uk) [20100713 11:40]:
...
Is that wrong? What has unzip transformed the filenames into if it
hasn't preserved them?
Ok, once again:
Zip will write the names to the archive in whatever encoding the
originating machine uses. However it will *not* record the encoding
used. So in this case unzip will read the names encoded in say latin-2
(a single byte encoding) and will write them as utf8 (a multi byte
encoding) which of cause will result in the gibberish the OP posted.
Isn't it rather than unzip simply dumps whatever filenames that were
zipped, and that the terminal attempts to display those names as if
they are utf8?  
Or does zip really convert from (for instance) latin-2 to utf8 ??
Exactly, as far as I know filenames are stored in the filesystem as
octets. There's no notion of characters or encodings. Neither does the
kernel care what the octet sequence represents. The semantics is added
by application layers above that. Talk of encoding in the filenames
themselves is muddled thinking, AFAIK.

I believe that unzip simply copies the octet sequence that is the
filename. So they can be read 'sensibly' by any application running in
an environment that uses the same character set and encoding, if that is
all that is required.

OTOH, if the requirement is to use the files with arbitrary applications
running in a utf-8 environment (which is probably the default in any
recently built system) then the filenames need to be changed such that
the sequence of octets represents a utf-8 encoding. As has been
suggested, convmv is a way to do that.

Cheers, Dave
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org