https://bugzilla.novell.com/show_bug.cgi?id=355783 Summary: file misinterprets UCS-2 encoded text as MPEG audio data Product: openSUSE 10.3 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Basesystem AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: jw@novell.com QAContact: qa@suse.de Found By: --- Text files written under Symbian OS are UCS-2 encoded. As a test, I have written the words 'Hello World' into a text file, and transferred it to a machine running suse linux 10.3. $ file test.txt file test.txt test.txt: MPEG ADTS, layer I, v1, 128 kBits, 32 kHz, Stereo A hexdump of my test file follows: $ xxd test.txt 0000000: fffe 4800 6500 6c00 6c00 6f00 2000 5700 ..H.e.l.l.o. .W. 0000010: 6f00 7200 6c00 6400 0a00 o.r.l.d... $ vim test.txt Hello World "test.txt" [converted] 1L, 14C :se fenc fileencoding=ucs-2le ZZ UCS-2 format is defined in rfc2781 and explained as follows in http://en.wikipedia.org/wiki/UTF-16: The UTF-16 (and UCS-2) encoding scheme allows either endian representation to be used, but mandates that the byte order should be explicitly indicated by prepending a Byte Order Mark before the first serialized character. This BOM is the encoded version of the Zero-Width No-Break Space (ZWNBSP) character, codepoint U+FEFF, chosen because it should never legitimately appear at the beginning of any character data. This results in the byte sequence FE FF (in hexadecimal) for big-endian architectures, or FF FE for little-endian. The BOM at the beginning of a UTF-16 or UCS-2 encoded data is considered to be a signature separate from the text itself; it is for the benefit of the decoder. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.