On 30/03/2019 13.08, Simon Becherer wrote:
Hi,
from: https://lists.opensuse.org/opensuse/2019-03/msg00320.html
i have NOW a file:
first question would be what type of file is this (what language html or xml odr what?) maybe with the correct filename extention ".xxx" it would load correct into some software?
i could load into firefox or libreoffice, libreoffice generates a excelsheet if i change .htm to .xls
but how did i get the correct encoding, because i think it's not a real wrong encoding, its something i would call readable-hex-encoding
You have the encoding on the header of the fake email :-) ------=_NextPart_01C3B8B1.6DB8B6D0^M Content-Location: file:///C:/CE594991/xls.htm^M Content-Transfer-Encoding: quoted-printable^M Content-Type: text/html; charset="us-ascii"^M . . . . . . . . . . . . . . . . . ********* :-) I don't think it is the 7 bit ascii, but the 8 bit IBM variant. I think that is ISO 8859-1 aka latin1. (man iso_8859-1) Oct Dec Hex Char Description 374 252 FC ü LATIN SMALL LETTER U WITH DIAERESIS
i found: https://www.ascii.cl/htmlcodes.htm where the hex number C4 points to the correct char.
is there some tool who could handle this file correct, or translate the endcoding?
iconv :-) However, LO asks what language to use for the import. Try with "English(USA)". Huh. You said: |> but how did i get the correct encoding, because i think it's not |> a real wrong encoding, its something i would call readable-hex-encoding That's an artifact of displaying in UTF, maybe a letter that has no representation in the current font. If I view the file "Bedarfsanalyse für Auftrag_2.xls" with less, I see those: "Dieses Dokument wurde f374r die Anzeige in Microsoft Excel 2002 oder h366her formatiert. Sie verwenden eine fr374here Version von Excel." But if I change to "Bedarfsanalyse für Auftrag_2.xls.txt" and view with 'mc' in hex mode, the real situation can be seen. 0000078 30 22 0D 0A 0D 0A 44 69 65 73 65 73 0"....Dieses 00000084 20 44 6F 6B 75 6D 65 6E 74 20 77 75 Dokument wu 00000090 72 64 65 20 66 FC 72 20 64 69 65 20 rde f.r die 0000009C 41 6E 7A 65 69 67 65 20 69 6E 20 4D Anzeige in M 000000A8 69 63 72 6F 73 6F 66 74 20 45 78 63 icrosoft Exc 000000B4 65 6C 20 32 30 30 32 20 6F 64 65 72 el 2002 oder 000000C0 20 68 F6 68 65 72 20 66 6F 72 6D 61 h.her forma <== 000000CC 74 69 65 72 74 2E 20 53 69 65 20 76 tiert. Sie v 000000D8 65 72 77 65 6E 64 65 6E 20 65 69 6E erwenden ein 000000E4 65 20 66 72 FC 68 65 72 65 20 56 65 e fr.here Ve 000000F0 72 73 69 6F 6E 20 76 6F 6E 20 45 78 rsion von Ex 000000FC 63 65 6C 2E 0D 0A 0D 0A 2D 2D 2D 2D cel.....---- ' 'h . h e 20 68 F6 68 65 Hex F6 is the hidden symbol. Man "iso_8859-1" says that is: ö LATIN SMALL LETTER O WITH DIAERESIS I suppose iconv would get it correct. Let's try: iconv -f LATIN1 -t UTF-8 Bedarfsanalyse\ für\ Auftrag_2.xls.txt -o D.txt "Dieses Dokument wurde für die Anzeige in Microsoft Excel 2002 oder höher formatiert. Sie verwenden eine frühere Version von Excel." Yep :-) -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)