Help reqd on how Kate and OOo 2 store Unicode
Opening OOo (first Writer then Calc) I entered the Unicode sequence: 0928 092e 0938 094d 0924 0947 (Devanagari script for namaste = "I bow to you" = greeting) but I find that OOo (both Writer and Calc) stores it as the following sequence in content.xml - e0 a4 a8 e0 a4 ae e0 a4 b8 e0 a5 8d e0 a4 a4 e0 a5 87 A friend told me that on Windows the digits are stored in little-endian model, and sure enough Windows Notepad saved the following Unicode text file: 28 09 2e 09 38 09 4d 09 24 09 47 09 where my original input is recognizable. But I fail to see any connection between the above text and what OOo stored. Kate also saved it as: e0 a4 a8 e0 a4 ae e0 a4 b8 e0 a5 8d e0 a4 a4 e0 a5 87 which is again different from the original Unicode sequence or Notepad's output but is the same as what OOo gave. I observe that e0 occupies positions 1, 4, 7 etc, and the length of the Kate / OOo text in bytes is exactly one third greater than that of the Notepad text. Apart from that I fail to identify any pattern relation between the Notepad text (original Unicode sequence) and the Kate / OOo text. Can anyone please elucidate this situation and why Kate and OOo stores the Unicode text in a different way from the actual Unicode sequence? Thanks. Shriramana. -- Penguin #395953 resides at http://samvit.org subsisting on SUSE Linux 10.0 with KDE 3.5
participants (1)
-
Shriramana Sharma