[opensuse] pdftotext output
Hello list users: I have a pdf file in Hungarian that I'd like to transform to a text file using pdftottext. But the output becomes gibberish since Hungarian characters with accents are transformed to strange things. Eg. "engedélyével" becomes "enged6ly6vel", "szerződés rögzíti" becomes "szerzodls rogzitr" etc. How could I fix this and make the right text output? I've checked the pdftotext manpage and read about the -enc option but it's not clear how to use it. Thanks, IG Titanic - A kiállítás. Eredeti leletek - Igaz történetek http://www.jegymester.hu/eventcalendar.jsp?place=80130&lang=HUN -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello! Try this: pdftotext -enc Latin2 filename.pdf filename.txt or this pdftotext -enc UTF-8 filename.pdf filename.txt Bye. 2007. 09. 15, szombat keltezéssel 22.53-kor Istvan Gabor ezt írta:
Hello list users:
I have a pdf file in Hungarian that I'd like to transform to a text file using pdftottext. But the output becomes gibberish since Hungarian characters with accents are transformed to strange things. Eg. "engedélyével" becomes "enged6ly6vel", "szerződés rögzíti" becomes "szerzodls rogzitr" etc. How could I fix this and make the right text output? I've checked the pdftotext manpage and read about the -enc option but it's not clear how to use it. Thanks, IG
Titanic - A kiállítás. Eredeti leletek - Igaz történetek http://www.jegymester.hu/eventcalendar.jsp?place=80130〈=HUN
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Borsos Imre Attila
âŠ=HUN
Thanks Imre, I tried both methods you suggested but none of them gave the correct result, characters were not extracted as expected. I've checked the pfd document properties in Adobe reader and it says in the fonts tab: Fonts used in this document: Helvetica, Type: type1, Encoding: Ansi, Actual Font: Arial MT, Actual Font Type: True Type Is this important? Thanks, IG Most minden 4. nyer a Balatonszelet csomagolásában található kódokkal. http://www.balatonszelet.hu -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hello! I tried to find another tool, but I think the most reliable tool is the online converting. You can convert pdf to text online, with the Adobe webpage at the following link: http://www.adobe.com/products/acrobat/access_onlinetools.html However, you can convert the pdf to ps, and after it you can convert to ascii. pdftops, ps2ascii I tried Scribus too, it's very powerfull. Bye. 2007. 09. 16, vasárnap keltezéssel 11.44-kor Istvan Gabor ezt írta:
Borsos Imre Attila
írta: ⌊=HUN
Thanks Imre,
I tried both methods you suggested but none of them gave the correct result, characters were not extracted as expected. I've checked the pfd document properties in Adobe reader and it says in the fonts tab: Fonts used in this document: Helvetica, Type: type1, Encoding: Ansi, Actual Font: Arial MT, Actual Font Type: True Type Is this important? Thanks, IG
Most minden 4. nyer a Balatonszelet csomagolásában található kódokkal. http://www.balatonszelet.hu
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Borsos Imre Attila
-
Istvan Gabor