From bugzilla_noreply@novell.com Fri Oct 27 13:59:42 2006
From: bugzilla_noreply@novell.com
To: bugs@lists.opensuse.org
Subject: [Bug 153557] Apache's directory auto-index can not display UTF-8
filenames correctly
Date: Fri, 27 Oct 2006 07:59:21 -0600
Message-ID: <20061027135921.CD836F20@molor.provo.novell.com>
In-Reply-To:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============8999874060377424645=="
--===============8999874060377424645==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
https://bugzilla.novell.com/show_bug.cgi?id=153557
mfabian@novell.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEEDINFO |ASSIGNED
Info Provider|mfabian@novell.com |
------- Comment #4 from mfabian@novell.com 2006-10-27 07:59 MST -------
I dislike "AddDefaultCharset UTF-8" very much because this makes
it impossible to write any HTML pages in other encodings.
Even if the HTML pages use a charset setting in the HTML header,
for example:
this won't work because the charset from the HTTP header is always
preferred.
Therefore I always use "AddDefaultCharset Off" in my apache2 setup
and make sure that all pages specify the charset in the HTML header.
Of course I prefer HTML pages in UTF-8 and most of the pages I write
are in UTF-8. But using other encodings for special pages should be
possible. At least I need it for some test pages. If the HTTP
header says "UTF-8" and this overrides everthing, I cannot even
have a single test page in a different encoding.
Therefore I agree with Björn that directory indices should get
special treatment and should be treated as UTF-8 on SuSE Linux.
As Peter says, other encodings are possible in filenames, but
we are using UTF-8 as the default for a long time already
in SuSE Linux, therefore one should assume that file names are in
UTF-8. File names which are not should be converted.
Having file names in mixed encodings is asking for trouble. Web
pages in different encoding can have a charset header but
file names can have no tags which say which encoding is used
in the file name.
Björn also suggested that auto detection of the filesystem charset
might be a good idea. I don't think this will really work as
auto detection of legacy encodings is a difficult problem which
can never work in all cases. One can only use heuristics which
work sometimes but not always.
UTF-8 is easily auto-detectable though, therefore Apache could send
UTF-8 in the HTTP header for a directory index if all files in this
directory have UTF-8 encoded files names and fall back to "something else"
if some files in do not have UTF-8 encoded file names.
I like the idea of "AddDirectoryIndexCharset something". This could be used
together with UTF-8 auto-detection to specify the "something else"
which should be used if not all file names are UTF-8 encoded.
As it can be easily checked if all file names are UTF-8 encoded or not,
it is probably a good idea to let apache do that and send UTF-8 in
the HTTP header always in that case, ignoring "AddDirectoryIndexCharset ...".
"AddDirectoryIndexCharset ..." would then only specify the charset to used
if the auto-detection finds that the directory contains file names which
are not UTF-8 encoded. Probably one should change the
name a bit though, e.g. "AddDirectoryIndexCharsetFallback ...".
--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
--===============8999874060377424645==--