https://bugzilla.novell.com/show_bug.cgi?id=247397 Summary: w3m is our default HTML text dump tool and our system defaults to UTF-8, but w3m defaults to print entities with ASCII equivalents Product: openSUSE 10.2 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem AssignedTo: max@novell.com ReportedBy: odabrunz@novell.com QAContact: qa@suse.de w3m is our default HTML text dump tool. The default locale for a SUSE Linux installation uses UTF-8 encoding. But w3m still defaults to print entities with ASCII equivalents. The result is that by default I have a system that displays Unicode characters, but when I open an HTML mail in mutt (or some other utility) the HTML entities are displayed with ASCII equivalents, e.g. Ä -> A:, ü -> u:. AFAICT, this default behaviour of w3m is non-conformant with our decision to have a UTF-8 system by default. Even when I start a text-based tool I expect this tool to do its best to support UTF-8 by default, rather than working around using an adequate display by default. Using ASCII equivalents to display entities is a workaround for systems that cannot display these characters, which is clearly not the case or the intention in SUSE Linux. Other issues around this: It is also difficult to find out how to change this behaviour of w3m. Especially to people who do not use w3m directly, but only indirectly because /etc/mailcap says that it is the default HTML dump utility, it is a hassle to find out how to change this behaviour. The w3m documentation tells the user little more than that a configuration file can be specified and how to list the available options. The user still has to find out by himself that he needs to start w3m manually and press "o" to edit the default configuration file (the location of this file is also not documented). Then he needs to find out that "Use ASCII equivalents to display entities" is the right option to change, rather than changing any of these: - Display charset - Default document charset - System charset - System charset follows locale(LC_CTYPE) - Decode Content-Transfer-Encoding when saving - Accept-Encoding header - Charset conversion using Unicode map - Charset conversion when loading - [leaving out other choices which I feel to be obviously not adequate, but which may still confuse the user who does not know w3m and/or I18N issues] Even I, bwalle and lnussel recently needed some time finding this option, although I have set it up years ago in my default configuration for w3m and I use w3m every day. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.