Mailinglist Archive: opensuse (1606 mails)

< Previous Next >
Re: [opensuse] strange samba rsync problem

----- Original Message -----
From: "Damon Register" <damon.w.register@xxxxxxxx>
To: <opensuse@xxxxxxxxxxxx>
Sent: Tuesday, September 02, 2008 12:23 PM
Subject: Re: [opensuse] strange samba rsync problem


G T Smith wrote:
Odd, and worrying, I assume you have checked the consistency of the
result i.e whether this happens to the same file in the same place, is
I think it is quite repeatable. As far as I know, that one file was
the only one with non-ascii letters in the name. Just for fun I tried
making a test here at work where we have a solaris hosted samba server
and a drive mapped to it on our PCs. I created a plain text file with
an accented a in the name. I copied it to a folder on the mapped drive.
I logged into the Solaris system and did ls -l on that folder.
The accented a was mangled. Stranger yet, on that Solaris system I
ran the nautilus file manager where it showed the same file with
the correct accent on the a.

That's not strange at all. That's just the natural consequence of using
different character sets in different interfaces to display the same string of
bytes.

What if any measures have you taken to ensure, or at last assure, that all
things which touch the file are either all using the same character set and
encoding, or failing that, that all parts are accurately and fully configured
to know what character sets and encodings all other parts are using so that
they may correctly translate in those cases where they might do so?

If you have no idea, then you will regularly see what _looks_ like errors like
this, unless you simply avoid using any characters outside of the traditional
low-ascii alpha-numeric values where most character sets happen to use the same
glyphs for that subset of ascii byte values.

If you speak the word "see" into a tape recorder,
And then play it back to a blindfolded, english-speaking, optometrist, they
probably hear the word "see".
Play the same tape back to a blindfolded, english-speaking, sailor, and they
probably hear "sea".
... spanish-speaker, probably hears "si".
... elglish speaking software developer probably hears "C".
etc etc etc...

So it is with computers and character sets, character encoding schemes, fonts.
There are some mechanisms for for putting data in context, so that when one
program "speaks" a string of bytes, it also indicates what "language" those
bytes are intended to be interpreted with. But those mechanisms are mostly
recent developments and not fully implimented and not fully backwards
compatible with older systems which had no such ability.

So in windows you create a filename with an lower-case-a-acute, on an windows
pc, in the USA, for an english speaking user, the file-save dialog UI is
_probably_ using utf16, utf8, or codepage 1252. In all of those cases the
a-acute just _happens_ to be the same integer value 0xe1 (0x00e1 for utf16).
So, at last on the windows pc local disk the filenme probably has the byte 0xe1
in it. Now you copy that file to the samba share. Lets assume for the moment
that samba does no magic translation of the filename at save-time, and so it
just copies the e1 without caring what glyph might be associated with that
value.
Now you log in from the console or telnet in from a terminal emulator that is
configured to accurately mimick the console.
The console may or may not be configured to load a software font over the the
vga hardware.
In the USA, if the console is loading a font, it's probaly latin-1, aka
iso-8859-1. I guess this was a bad example haracter, beacause in that charactr
set too, once again 0xe1 just happens to be lower-case-a-acute.
But, the character set that is built-in to most vga hardware is not any of the
character sets mentioned so far, it's codepage 437.
The glyph associated with the integer value 0xe1 in the codepage 437 character
set is the alpha symbol.
If the console is configured not to do any character translating or software
font loading and is thus using codepage 437, and if samba is not doing any
translating, then when you ls that file name at the console, instead of
lower-case-a-acute, you'll see an alpha.
But, almost any program in X on the same box is probably using latin-1. So, in
nautilus file manager, you'll probably see the lower-case-a-acute.

The character hasn't changed and isnt "wrong" any where. Merely if you are
going to use _any_ characters outside of the least common denominator set of
plain lower-ascii (aka 7-bit us-ascii) then you need to understand all about
character sets and display contexts. If you don't or can't do that, then stick
to the plain characters. Most character sets have the same glyphs for that
subset of byte values. (0x20 to 0x7e or decimal 31 to 126) (and less tangibly,
the same sorting rules for those characters)
In fact, regardless of how much you might learn about this, and how well &
completely you might configure all the software on all the devices in your
organization, so that maybe they all speak utf8, so everything is consistent
everywhere, you should probaly _still_ avoid special characters because you can
never know what character set someone else is using who may need to handle
files (or even merely their names) created within your organization.

Brian K. White brian@xxxxxxxxx http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!

--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >
Follow Ups