[Bug 153557] Apache's directory auto-index can not display UTF-8 filenames correctly
![](https://seccdn.libravatar.org/avatar/3035b38ff33cf86f480bb169b8500b80.jpg?s=120&d=mm&r=g)
https://bugzilla.novell.com/show_bug.cgi?id=153557 mfabian@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|mfabian@novell.com | ------- Comment #4 from mfabian@novell.com 2006-10-27 07:59 MST ------- I dislike "AddDefaultCharset UTF-8" very much because this makes it impossible to write any HTML pages in other encodings. Even if the HTML pages use a charset setting in the HTML header, for example: <html> <head> <title> </title> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=big5"> </head> this won't work because the charset from the HTTP header is always preferred. Therefore I always use "AddDefaultCharset Off" in my apache2 setup and make sure that all pages specify the charset in the HTML header. Of course I prefer HTML pages in UTF-8 and most of the pages I write are in UTF-8. But using other encodings for special pages should be possible. At least I need it for some test pages. If the HTTP header says "UTF-8" and this overrides everthing, I cannot even have a single test page in a different encoding. Therefore I agree with Björn that directory indices should get special treatment and should be treated as UTF-8 on SuSE Linux. As Peter says, other encodings are possible in filenames, but we are using UTF-8 as the default for a long time already in SuSE Linux, therefore one should assume that file names are in UTF-8. File names which are not should be converted. Having file names in mixed encodings is asking for trouble. Web pages in different encoding can have a charset header but file names can have no tags which say which encoding is used in the file name. Björn also suggested that auto detection of the filesystem charset might be a good idea. I don't think this will really work as auto detection of legacy encodings is a difficult problem which can never work in all cases. One can only use heuristics which work sometimes but not always. UTF-8 is easily auto-detectable though, therefore Apache could send UTF-8 in the HTTP header for a directory index if all files in this directory have UTF-8 encoded files names and fall back to "something else" if some files in do not have UTF-8 encoded file names. I like the idea of "AddDirectoryIndexCharset something". This could be used together with UTF-8 auto-detection to specify the "something else" which should be used if not all file names are UTF-8 encoded. As it can be easily checked if all file names are UTF-8 encoded or not, it is probably a good idea to let apache do that and send UTF-8 in the HTTP header always in that case, ignoring "AddDirectoryIndexCharset ...". "AddDirectoryIndexCharset ..." would then only specify the charset to used if the auto-detection finds that the directory contains file names which are not UTF-8 encoded. Probably one should change the name a bit though, e.g. "AddDirectoryIndexCharsetFallback ...". -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com