Per Jessen wrote:
Philipp Thomas wrote:
* Dave Howorth (dhoworth@mrc-lmb.cam.ac.uk) [20100713 11:40]:
Is that wrong? What has unzip transformed the filenames into if it hasn't preserved them? Ok, once again: Zip will write the names to the archive in whatever encoding the originating machine uses. However it will *not* record the encoding used. So in this case unzip will read the names encoded in say latin-2 (a single byte encoding) and will write them as utf8 (a multi byte encoding) which of cause will result in the gibberish the OP posted.
Isn't it rather than unzip simply dumps whatever filenames that were zipped, and that the terminal attempts to display those names as if they are utf8? Or does zip really convert from (for instance) latin-2 to utf8 ??
Exactly, as far as I know filenames are stored in the filesystem as octets. There's no notion of characters or encodings. Neither does the kernel care what the octet sequence represents. The semantics is added by application layers above that. Talk of encoding in the filenames themselves is muddled thinking, AFAIK. I believe that unzip simply copies the octet sequence that is the filename. So they can be read 'sensibly' by any application running in an environment that uses the same character set and encoding, if that is all that is required. OTOH, if the requirement is to use the files with arbitrary applications running in a utf-8 environment (which is probably the default in any recently built system) then the filenames need to be changed such that the sequence of octets represents a utf-8 encoding. As has been suggested, convmv is a way to do that. Cheers, Dave -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org