[opensuse] char hex encoding html / xml?
Hi, from: https://lists.opensuse.org/opensuse/2019-03/msg00320.html i have NOW a file: http://susepaste.org/47685312 first question would be what type of file is this (what language html or xml odr what?) maybe with the correct filename extention ".xxx" it would load correct into some software? i could load into firefox or libreoffice, libreoffice generates a excelsheet if i change .htm to .xls but how did i get the correct encoding, because i think it's not a real wrong encoding, its something i would call readable-hex-encoding ======== the line: mso-header-data:"&L&B&\0022Arial\0022&14Bedarfsanalyse f=FCr Auftrag: &B&= 8Datum: 25.03.2019 &10Auf-Nr: 1865.70&C&R"; the chars: =FC -> should be a german "u" with dots on top "ü" ============ or the line <td class=3Ds2 x:str>L=C4NGSTR=C4GER L 810</td> the chars: =C4 -> should be a german "A" with dots on top "Ä" =========== or the line: <td class=3Ds2 x:str>F=DCHRUNGSSCHIENE</td> the chars: =DC -> should be a german "U" with dots on top "Ü" ============= i found: https://www.ascii.cl/htmlcodes.htm where the hex number C4 points to the correct char. is there some tool who could handle this file correct, or translate the endcoding? simoN -- www.becherer.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Simon Becherer wrote:
Hi,
from: https://lists.opensuse.org/opensuse/2019-03/msg00320.html
i have NOW a file:
first question would be what type of file is this (what language html or xml odr what?) maybe with the correct filename extention ".xxx" it would load correct into some software?
I would say this is a Microsoft Excel spreadsheet, rendered as HTML, encoded as qouted-printable.
======== the line: mso-header-data:"&L&B&\0022Arial\0022&14Bedarfsanalyse f=FCr Auftrag: &B&= 8Datum: 25.03.2019 &10Auf-Nr: 1865.70&C&R";
the chars: =FC -> should be a german "u" with dots on top "ü"
Right, that is quoted-printable encoding.
is there some tool who could handle this file correct, or translate the endcoding?
qprint. (from googling, I haven't tried it). -- Per Jessen, Zürich (15.1°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am 30.03.19 um 13:35 schrieb Per Jessen:
qprint. (from googling, I haven't tried it).
ah, thanks for the hint. i have to try if nobody could help me with this: ====================================== i played meanwhile around, and actually i got more files out of this emails-file: xls.htm ./xls_files/fileliest.xml ./xls_files/sheet001.htm ./xls_files/stylesheet.css it looks for me that this is the internal of some "more modern" xls format. did anyone know, how to get this parts back again together to one file readable by libreoffice/microsoft office ? the xls.htm has inside snip... meta name=3D"Excel Workbook Frameset"> <link rel=3DFile-List href=3D"xls_files/filelist.xml"> ..snip so it refers to the filelist and there are all 4 files listed. i think if i would know how to zip/gz/however ms did this, i should get back the working file simoN
Simon Becherer wrote:
Hi,
from: https://lists.opensuse.org/opensuse/2019-03/msg00320.html
i have NOW a file:
first question would be what type of file is this (what language html or xml odr what?) maybe with the correct filename extention ".xxx" it would load correct into some software?
I would say this is a Microsoft Excel spreadsheet, rendered as HTML, encoded as qouted-printable.
======== the line: mso-header-data:"&L&B&\0022Arial\0022&14Bedarfsanalyse f=FCr Auftrag: &B&= 8Datum: 25.03.2019 &10Auf-Nr: 1865.70&C&R";
the chars: =FC -> should be a german "u" with dots on top "ü"
Right, that is quoted-printable encoding.
is there some tool who could handle this file correct, or translate the endcoding?
-- B e c h e r e r GmbH Sondermaschinenbau Mauermatten Strasse 22 79183 Waldkirch Germany Tel.: (+49) (0)7681 3134 Fax: (+49) (0)7681 4378 Mail: info@becherer.de Web: www.becherer.de USt-ID-Nr.: DE 814912198 Registergericht: Freiburg HRB 701860 Geschäftsführer: Dipl.-Ing. (FH), EWE Simon H. Becherer Gerichtsstand / Sitz: Waldkirch Es gelten ausschließlich unsere allgemeinen Liefer- und Zahlungsbedingungen / Einkaufsbedingungen: www.becherer.de/AGB -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/03/2019 14.29, Simon Becherer wrote:
Am 30.03.19 um 13:35 schrieb Per Jessen:
qprint. (from googling, I haven't tried it).
ah, thanks for the hint. i have to try if nobody could help me with this:
======================================
i played meanwhile around,
and actually i got more files out of this emails-file:
xls.htm ./xls_files/fileliest.xml ./xls_files/sheet001.htm ./xls_files/stylesheet.css
it looks for me that this is the internal of some "more modern" xls format.
Indeed.
did anyone know, how to get this parts back again together to one file readable by libreoffice/microsoft office ?
the xls.htm has inside snip... meta name=3D"Excel Workbook Frameset"> <link rel=3DFile-List href=3D"xls_files/filelist.xml"> ..snip
so it refers to the filelist and there are all 4 files listed. i think if i would know how to zip/gz/however ms did this, i should get back the working file
I did a test with LO. You can "save as" excel file. I tried xls and xlsx. The later is actually a zip archive, so I renamed to " test.xlsx.zip" and could open the archive with 'mc', but the structure is different. cer@Telcontar:~/tmp/simon/test> tree . ├── [Content_Types].xml ├── _rels ├── docProps │ ├── app.xml │ └── core.xml └── xl ├── _rels │ └── workbook.xml.rels ├── sharedStrings.xml ├── styles.xml ├── workbook.xml └── worksheets └── sheet1.xml 5 directories, 8 files cer@Telcontar:~/tmp/simon/test> "test.xls" I don't know how it is done. It is not zip. cer@Telcontar:~/tmp/simon> file test.xls test.xls: Composite Document File V2 Document, Little Endian, Os: Windows, Version 1.0, Code page: -535, Revision Number: 1, Total Editing Time: 01:10, Create Time/Date: Sat Mar 30 13:41:34 2019, Last Saved Time/Date: Sat Mar 30 13:42:42 2019 cer@Telcontar:~/tmp/simon> Should be excel 2002. I think that is the "test.xls", but I don't know how to "open" it up in components. Google? -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
Hi carlos, Am 30.03.19 um 14:50 schrieb Carlos E. R.:
I did a test with LO. You can "save as" excel file. I tried xls and xlsx. The later is actually a zip archive, so I renamed to " test.xlsx.zip" and could open the archive with 'mc', but the structure is different. i did the same..... Google? has not helped much...
i will stop for the moment, must do some more important work now. maybe someone knows here this file format and could help. if not i will try qprint. the extractet excel sheet after qprint will not be perfect but at least i will have the data. simoN -- www.becherer.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/03/2019 15.09, Simon Becherer wrote:
Hi carlos,
Am 30.03.19 um 14:50 schrieb Carlos E. R.:
I did a test with LO. You can "save as" excel file. I tried xls and xlsx. The later is actually a zip archive, so I renamed to " test.xlsx.zip" and could open the archive with 'mc', but the structure is different. i did the same..... Google? has not helped much...
i will stop for the moment, must do some more important work now. maybe someone knows here this file format and could help. if not i will try qprint. the extractet excel sheet after qprint will not be perfect but at least i will have the data.
It is iconv you have to use, on the original, before extracting the pieces. See my other post. -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
Am 30.03.19 um 15:11 schrieb Carlos E. R.:
It is iconv you have to use, on the original, before extracting the pieces. See my other post. no, i do not think so, see my other post ;-))
simoN -- www.becherer.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Carlos E. R. wrote:
On 30/03/2019 15.09, Simon Becherer wrote:
Hi carlos,
Am 30.03.19 um 14:50 schrieb Carlos E. R.:
I did a test with LO. You can "save as" excel file. I tried xls and xlsx. The later is actually a zip archive, so I renamed to " test.xlsx.zip" and could open the archive with 'mc', but the structure is different. i did the same..... Google? has not helped much...
i will stop for the moment, must do some more important work now. maybe someone knows here this file format and could help. if not i will try qprint. the extractet excel sheet after qprint will not be perfect but at least i will have the data.
It is iconv you have to use, on the original, before extracting the pieces. See my other post.
iconv does conversions between character sets, it doesn't do anything with encodings. -- Per Jessen, Zürich (11.6°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/03/2019 19.06, Per Jessen wrote:
Carlos E. R. wrote:
On 30/03/2019 15.09, Simon Becherer wrote:
Hi carlos,
Am 30.03.19 um 14:50 schrieb Carlos E. R.:
I did a test with LO. You can "save as" excel file. I tried xls and xlsx. The later is actually a zip archive, so I renamed to " test.xlsx.zip" and could open the archive with 'mc', but the structure is different. i did the same..... Google? has not helped much...
i will stop for the moment, must do some more important work now. maybe someone knows here this file format and could help. if not i will try qprint. the extractet excel sheet after qprint will not be perfect but at least i will have the data.
It is iconv you have to use, on the original, before extracting the pieces. See my other post.
iconv does conversions between character sets, it doesn't do anything with encodings.
I had not noticed the encodings further down the file. -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
On 30/03/2019 15.09, Simon Becherer wrote:
Hi carlos,
Am 30.03.19 um 14:50 schrieb Carlos E. R.:
I did a test with LO. You can "save as" excel file. I tried xls and xlsx. The later is actually a zip archive, so I renamed to " test.xlsx.zip" and could open the archive with 'mc', but the structure is different. i did the same..... Google? has not helped much...
google "structure of xls excel 2002 files" <https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/cd03cb5f-ca02-4934-a391-bb674cb8aa06> <https://en.wikipedia.org/wiki/Microsoft_Excel#File_formats> File formats Microsoft Excel up until 2007 version used a proprietary binary file format called Excel Binary File Format (.XLS) as its primary format.[27] Excel 2007 uses Office Open XML as its primary file format, an XML-based format that followed after a previous XML-based format called "XML Spreadsheet" ("XMLSS"), first introduced in Excel 2002.[28] Although supporting and encouraging the use of new XML-based formats as replacements, Excel 2007 remained backwards-compatible with the traditional, binary formats. In addition, most versions of Microsoft Excel can read CSV, DBF, SYLK, DIF, and other legacy formats. Support for some older file formats was removed in Excel 2007.[29] The file formats were mainly from DOS-based programs. Binary OpenOffice.org has created documentation of the Excel format.[30] Since then Microsoft made the Excel binary format specification available to freely download.[31] XML Spreadsheet Main article: Microsoft Office XML formats The XML Spreadsheet format introduced in Excel 2002[28] is a simple, XML based format missing some more advanced features like storage of VBA macros. Though the intended file extension for this format is .xml, the program also correctly handles XML files with .xls extension. This feature is widely used by third-party applications (e.g. MySQL Query Browser) to offer "export to Excel" capabilities without implementing binary file format. The following example will be correctly opened by Excel if saved either as Book1.xml or Book1.xls: Also, google "mail excel file" and you will see what your correspondent has done. There are even videos. It is possible he selected part of the sheet and hit "mail this". Try to tell him to send mail from the file explorer, not from inside excel. Or... How complicated things are... -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
Carlos E. R. wrote:
Also, google "mail excel file" and you will see what your correspondent has done. There are even videos. It is possible he selected part of the sheet and hit "mail this".
Ah yes, that sounds quite likely. That would certainly explain the HTML with QP encoding. -- Per Jessen, Zürich (11.7°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/03/2019 13.08, Simon Becherer wrote:
Hi,
from: https://lists.opensuse.org/opensuse/2019-03/msg00320.html
i have NOW a file:
first question would be what type of file is this (what language html or xml odr what?) maybe with the correct filename extention ".xxx" it would load correct into some software?
i could load into firefox or libreoffice, libreoffice generates a excelsheet if i change .htm to .xls
but how did i get the correct encoding, because i think it's not a real wrong encoding, its something i would call readable-hex-encoding
You have the encoding on the header of the fake email :-) ------=_NextPart_01C3B8B1.6DB8B6D0^M Content-Location: file:///C:/CE594991/xls.htm^M Content-Transfer-Encoding: quoted-printable^M Content-Type: text/html; charset="us-ascii"^M . . . . . . . . . . . . . . . . . ********* :-) I don't think it is the 7 bit ascii, but the 8 bit IBM variant. I think that is ISO 8859-1 aka latin1. (man iso_8859-1) Oct Dec Hex Char Description 374 252 FC ü LATIN SMALL LETTER U WITH DIAERESIS
i found: https://www.ascii.cl/htmlcodes.htm where the hex number C4 points to the correct char.
is there some tool who could handle this file correct, or translate the endcoding?
iconv :-) However, LO asks what language to use for the import. Try with "English(USA)". Huh. You said: |> but how did i get the correct encoding, because i think it's not |> a real wrong encoding, its something i would call readable-hex-encoding That's an artifact of displaying in UTF, maybe a letter that has no representation in the current font. If I view the file "Bedarfsanalyse für Auftrag_2.xls" with less, I see those: "Dieses Dokument wurde f374r die Anzeige in Microsoft Excel 2002 oder h366her formatiert. Sie verwenden eine fr374here Version von Excel." But if I change to "Bedarfsanalyse für Auftrag_2.xls.txt" and view with 'mc' in hex mode, the real situation can be seen. 0000078 30 22 0D 0A 0D 0A 44 69 65 73 65 73 0"....Dieses 00000084 20 44 6F 6B 75 6D 65 6E 74 20 77 75 Dokument wu 00000090 72 64 65 20 66 FC 72 20 64 69 65 20 rde f.r die 0000009C 41 6E 7A 65 69 67 65 20 69 6E 20 4D Anzeige in M 000000A8 69 63 72 6F 73 6F 66 74 20 45 78 63 icrosoft Exc 000000B4 65 6C 20 32 30 30 32 20 6F 64 65 72 el 2002 oder 000000C0 20 68 F6 68 65 72 20 66 6F 72 6D 61 h.her forma <== 000000CC 74 69 65 72 74 2E 20 53 69 65 20 76 tiert. Sie v 000000D8 65 72 77 65 6E 64 65 6E 20 65 69 6E erwenden ein 000000E4 65 20 66 72 FC 68 65 72 65 20 56 65 e fr.here Ve 000000F0 72 73 69 6F 6E 20 76 6F 6E 20 45 78 rsion von Ex 000000FC 63 65 6C 2E 0D 0A 0D 0A 2D 2D 2D 2D cel.....---- ' 'h . h e 20 68 F6 68 65 Hex F6 is the hidden symbol. Man "iso_8859-1" says that is: ö LATIN SMALL LETTER O WITH DIAERESIS I suppose iconv would get it correct. Let's try: iconv -f LATIN1 -t UTF-8 Bedarfsanalyse\ für\ Auftrag_2.xls.txt -o D.txt "Dieses Dokument wurde für die Anzeige in Microsoft Excel 2002 oder höher formatiert. Sie verwenden eine frühere Version von Excel." Yep :-) -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
Hi carlos, Am 30.03.19 um 14:03 schrieb Carlos E. R.:
ö LATIN SMALL LETTER O WITH DIAERESIS
I suppose iconv would get it correct. Let's try:
iconv -f LATIN1 -t UTF-8 Bedarfsanalyse\ für\ Auftrag_2.xls.txt -o D.txt
"Dieses Dokument wurde für die Anzeige in Microsoft Excel 2002 oder höher formatiert. Sie verwenden eine frühere Version von Excel."
Yep :-)
for this line (line 4) you are right. but the others not. i was refering the line: mso-header-data:"&L&B&\0022Arial\0022&14Bedarfsanalyse f=FCr Auftrag: &B& (line 149) and for this line there are 3 chars "=FC" for the "ü" inside. simoN -- www.becherer.de -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/03/2019 15.20, Simon Becherer wrote:
Hi carlos,
Am 30.03.19 um 14:03 schrieb Carlos E. R.:
ö LATIN SMALL LETTER O WITH DIAERESIS
I suppose iconv would get it correct. Let's try:
iconv -f LATIN1 -t UTF-8 Bedarfsanalyse\ für\ Auftrag_2.xls.txt -o D.txt
"Dieses Dokument wurde für die Anzeige in Microsoft Excel 2002 oder höher formatiert. Sie verwenden eine frühere Version von Excel."
Yep :-)
for this line (line 4) you are right. but the others not. i was refering the line:
mso-header-data:"&L&B&\0022Arial\0022&14Bedarfsanalyse f=FCr Auftrag: &B&
(line 149) and for this line there are 3 chars "=FC" for the "ü" inside.
Argh. :-( -- Cheers / Saludos, Carlos E. R. (from 15.0 x86_64 at Telcontar)
participants (3)
-
Carlos E. R.
-
Per Jessen
-
Simon Becherer