Mailinglist Archive: opensuse-bugs (7187 mails)
| < Previous | Next > |
[Bug 355757] New: recode generates nonstandard UCS-2 encoding.
- From: bugzilla_noreply@xxxxxxxxxx
- Date: Wed, 23 Jan 2008 13:20:55 -0700 (MST)
- Message-id: <bug-355757-21960@xxxxxxxxxxxxxxxxxxxxxxxxx/>
https://bugzilla.novell.com/show_bug.cgi?id=355757
Summary: recode generates nonstandard UCS-2 encoding.
Product: openSUSE 10.3
Version: Final
Platform: Other
OS/Version: Other
Status: NEW
Severity: Normal
Priority: P5 - None
Component: Basesystem
AssignedTo: bnc-team-screening@xxxxxxxxxxxxxxxxxxxxxx
ReportedBy: jw@xxxxxxxxxx
QAContact: qa@xxxxxxx
Depends on: 355755
Found By: ---
echo Hello World > test.txt
recode ..UCS-2 test.txt > test.ucs2
xxd test.txt
0000000: 0048 0065 006c 006c 006f 0020 0057 006f .H.e.l.l.o. .W.o
0000010: 0072 006c 0064 000a .r.l.d..
This hexdump shows big-endian ucs-2, but misses a BOM.
The BOM is defined in rfc2781 and explained as follows in
http://en.wikipedia.org/wiki/UTF-16:
The UTF-16 (and UCS-2) encoding scheme allows either endian representation to
be used, but mandates that the byte order should be explicitly indicated by
prepending a Byte Order Mark before the first serialized character. This BOM is
the encoded version of the Zero-Width No-Break Space (ZWNBSP) character,
codepoint U+FEFF, chosen because it should never legitimately appear at the
beginning of any character data. This results in the byte sequence FE FF (in
hexadecimal) for big-endian architectures, or FF FE for little-endian. The BOM
at the beginning of a UTF-16 or UCS-2 encoded data is considered to be a
signature separate from the text itself; it is for the benefit of the decoder.
[...]
The BOM is not optional in the UCS-2 scheme.
--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Summary: recode generates nonstandard UCS-2 encoding.
Product: openSUSE 10.3
Version: Final
Platform: Other
OS/Version: Other
Status: NEW
Severity: Normal
Priority: P5 - None
Component: Basesystem
AssignedTo: bnc-team-screening@xxxxxxxxxxxxxxxxxxxxxx
ReportedBy: jw@xxxxxxxxxx
QAContact: qa@xxxxxxx
Depends on: 355755
Found By: ---
echo Hello World > test.txt
recode ..UCS-2 test.txt > test.ucs2
xxd test.txt
0000000: 0048 0065 006c 006c 006f 0020 0057 006f .H.e.l.l.o. .W.o
0000010: 0072 006c 0064 000a .r.l.d..
This hexdump shows big-endian ucs-2, but misses a BOM.
The BOM is defined in rfc2781 and explained as follows in
http://en.wikipedia.org/wiki/UTF-16:
The UTF-16 (and UCS-2) encoding scheme allows either endian representation to
be used, but mandates that the byte order should be explicitly indicated by
prepending a Byte Order Mark before the first serialized character. This BOM is
the encoded version of the Zero-Width No-Break Space (ZWNBSP) character,
codepoint U+FEFF, chosen because it should never legitimately appear at the
beginning of any character data. This results in the byte sequence FE FF (in
hexadecimal) for big-endian architectures, or FF FE for little-endian. The BOM
at the beginning of a UTF-16 or UCS-2 encoded data is considered to be a
signature separate from the text itself; it is for the benefit of the decoder.
[...]
The BOM is not optional in the UCS-2 scheme.
--
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
| < Previous | Next > |