[opensuse] OCR PDF
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy. Any other suggestions? ;-) Yours, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On December 1, 2014 5:14:25 AM EST, "Vojtěch Zeisek" <vojtech.zeisek@opensuse.org> wrote:
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy. Any other suggestions? ;-) Yours, Vojtěch
My Canon printer came with software to do that during the scan operation. It's Windows software but it might run in wine. Just brainstorming. Greg -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Po 1. prosince 2014 07:45:04, Greg Freemyer napsal(a):
On December 1, 2014 5:14:25 AM EST, "Vojtěch Zeisek" <vojtech.zeisek@opensuse.org> wrote:
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy. Any other suggestions? ;-) Yours, Vojtěch
My Canon printer came with software to do that during the scan operation. It's Windows software but it might run in wine.
I also have Canon printer. Surprisingly they offer Linux driver for download. I wouldn't expect any useful software on Canon Windows CD. :-) And if yes, this the last choice... -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On Mon, 01 Dec 2014 11:14:25 +0100 Vojtěch Zeisek <vojtech.zeisek@opensuse.org> wrote:
Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy.
I've fresh installation of SUSE-13.2 and Vuescan works perfectly. The *lifetime* license is $79 which I consider very cheap, e.g boxed version of SUSE is 49.95€, you buy some drink and have license for Vuescan for the life which supports tons of scanner, constantly upgraded, ability to scan in RAW format etc. I'd agree with you that "it is not the cheapest software" - it's bargain - even for someone from Croatia. ;) Sincerely, Gour -- For him who has conquered the mind, the mind is the best of friends; but for one who has failed to do so, his mind will remain the greatest enemy.
Dne Po 1. prosince 2014 14:39:45, Gour napsal(a):
On Mon, 01 Dec 2014 11:14:25 +0100 Vojtěch Zeisek wrote:
Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy.
I've fresh installation of SUSE-13.2 and Vuescan works perfectly.
Is it able to OCR also existing PDFs or to process only texts scanned right now by the scanner?
The *lifetime* license is $79 which I consider very cheap, e.g boxed version of SUSE is 49.95€, you buy some drink and have license for Vuescan for the life which supports tons of scanner, constantly upgraded, ability to scan in RAW format etc.
Does the license contain upgrades? Well, might be... Christmas are approaching... :-D
I'd agree with you that "it is not the cheapest software" - it's bargain - even for someone from Croatia. ;)
One would hope for some nice working OSS alternative... ;-)
Sincerely, Gour
Thank You for the info, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On Mon, 01 Dec 2014 14:54:50 +0100 Vojtěch Zeisek <vojtech.zeisek@opensuse.org> wrote:
Is it able to OCR also existing PDFs or to process only texts scanned right now by the scanner?
Yes, check: https://www.hamrick.com/reg.html
Does the license contain upgrades? Well, might be... Christmas are approaching... :-D
Today there was sale (expired), so now lifetime license is $89.95 an you get all the future upgrades for *free* !! The best is to download latest version and try for yourself. Unregistered version will just produce watermark until you register.
One would hope for some nice working OSS alternative... ;-)
Yes, but since I had need for quality scanning (many old 35mm slides), X(sane) was/is far away. It's simply astonishing how can practically one man support so many scanners in comparison with the whole OSS community. Sincerely, Gour -- The spirit soul bewildered by the influence of false ego thinks himself the doer of activities that are in actuality carried out by the three modes of material nature.
On 2014-12-01 15:55, Gour wrote:
Yes, but since I had need for quality scanning (many old 35mm slides),
Interesting. I'm searching for a scanner for this task. So far, the "Reflecta x7-Scan" is wining, fast and not expensive. It is not really a scanner, rather a specialized 14-megapixel camera. I considered the "Reflecta Crystal Scan 7200", wonderful but horribly slow, 5 minutes per shot at the best quality. And needs Windows. If you can point me to other suggestions, I'll have a look.
<http://www.scandig.com/filmscanner/reflecta/reflecta-crystal-scan-7200-mit-silverfast-se.html> <http://www.photographyblog.com/news/reflecta_x7/>
-- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
Dne Po 1. prosince 2014 16:20:29, Carlos E. R. napsal(a):
On 2014-12-01 15:55, Gour wrote:
Yes, but since I had need for quality scanning (many old 35mm slides),
Interesting. I'm searching for a scanner for this task.
So far, the "Reflecta x7-Scan" is wining, fast and not expensive. It is not really a scanner, rather a specialized 14-megapixel camera.
I considered the "Reflecta Crystal Scan 7200", wonderful but horribly slow, 5 minutes per shot at the best quality. And needs Windows.
If you can point me to other suggestions, I'll have a look.
My colleague has some Canon CanoScan special film scanner (I don't remember exact model number, and that model isn't available any more, anyway). It was very slow when using highest quality, but the output was very good. I digitised my collection on it several years ago. And his computer was running Ubuntu and scanner was managed with Vuescan. So I can recommend Vuescan for that purpose... Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On 2014-12-01 16:34, Vojtěch Zeisek wrote:
Dne Po 1. prosince 2014 16:20:29, Carlos E. R. napsal(a):
If you can point me to other suggestions, I'll have a look.
My colleague has some Canon CanoScan special film scanner (I don't remember exact model number, and that model isn't available any more, anyway). It was very slow when using highest quality, but the output was very good. I digitised my collection on it several years ago. And his computer was running Ubuntu and scanner was managed with Vuescan. So I can recommend Vuescan for that purpose...
I'll certainly consider it, after I get suitable hardware... that's my current stumbling block. The hardware. The 7200 does two scans: the second one in infrared. Combining both removes dust and scratches, automatically; and this requires specialized software (CyberView/Photoshop or SilverFast SE 8 (Windows only). Maybe vuescan supports it, unknown; but it adds to the price. This processing takes 5 or 8 minutes, depending on the reporter (includes manual positioning of the slides or film). Way too long. Slow processing would be acceptable if fully automatic, or for very few slides. The x7-Scan instead just makes a photo into a memory card, without computer, and does it in seconds each. Then you import the photos into any operating system. So it is a more likely candidate for a home user (and Linux!) But... the thing about removing scratches is certainly interesting. And I would like more resolution than 14 megapixels, if possible. But absolutely need cheap! So... I doubt. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2014 07:53 AM, Carlos E. R. wrote:
The 7200 does two scans: the second one in infrared. Combining both removes dust and scratches, automatically; and this requires specialized software (CyberView/Photoshop or SilverFast SE 8 (Windows only). Maybe vuescan
Dust and scratches removed by infrared? Sounds like junk science to me. A scratch is a scratch. The information is gone. - -- After all is said and done, more is said than done. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlR8wNgACgkQv7M3G5+2DLJ2bACePBzR/uwl5c9XqHPdSxlhheER 2tMAoKFGyuxgkG4d+SkDQlq2zlkgal6h =0r9T -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-12-01 20:26, John Andersen wrote:
On 12/01/2014 07:53 AM, Carlos E. R. wrote:
The 7200 does two scans: the second one in infrared. Combining both removes dust and scratches, automatically; and this requires specialized software (CyberView/Photoshop or SilverFast SE 8 (Windows only). Maybe vuescan
Dust and scratches removed by infrared? Sounds like junk science to me. A scratch is a scratch. The information is gone.
Of course! But read the links, they explain the idea. That's about all I know. The 7200 link has another link to a full report or similar words. It is long. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On Mon, 01 Dec 2014 16:20:29 +0100 "Carlos E. R." <robin.listas@telefonica.net> wrote:
Interesting. I'm searching for a scanner for this task.
Well, I bought Epson V700 scanner which is not specialized 35mm scanner, but the flatbed having helpers to speed up 35mm/film scanning and can be used for documents as well. What I like is ability of Vuescan to use so called *raw* format which can be later post-processed for removing dust etc. Otherwise, it depends how much slides you have to scan and it might be that going to specialized studio might be better option. Sincerely, Gour -- The working senses are superior to dull matter; mind is higher than the senses; intelligence is still higher than the mind; and he [the soul] is even higher than the intelligence.
On 2014-12-01 17:43, Gour wrote:
On Mon, 01 Dec 2014 16:20:29 +0100 "Carlos E. R." <> wrote:
Interesting. I'm searching for a scanner for this task.
Well, I bought Epson V700 scanner which is not specialized 35mm scanner, but the flatbed having helpers to speed up 35mm/film scanning and can be used for documents as well.
A possibility, but mine I don't think they still sell the attachment. And the process would be slow, anyway.
What I like is ability of Vuescan to use so called *raw* format which can be later post-processed for removing dust etc.
Yes, somehow. But the technique of a second scan in infrared looks very enticing. It is automatic.
Otherwise, it depends how much slides you have to scan and it might be that going to specialized studio might be better option.
If I go for the hardware, I intend to digitize my entire collection and that of my parents. A few hundreds, I expect. I even have some BW glass negatives, made around 1900, which I don't know how to handle. I tried my flatbed scanner, but it doesn't work right with the normal light: it needs backlight and I don't have it. The negative attachment, even if they still sell it, is too small. I would need a way to switch off the internal light, then add my own light somehow. Similarly for some film of non standard size; I think it is 6 cm, quite old. Some in colour, so this time the backlight needs calibration (B/W is more forgiving). But archiving my 35mm negatives would be enough to justify a dedicated scanner such as the X7. It is almost good enough. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On Mon, 01 Dec 2014 18:11:58 +0100 "Carlos E. R." <robin.listas@telefonica.net> wrote:
If I go for the hardware, I intend to digitize my entire collection and that of my parents. A few hundreds, I expect.
Mine is few thousands and every scan was huge in raw format. Sincerely, Gour -- Even if you are considered to be the most sinful of all sinners, when you are situated in the boat of transcendental knowledge you will be able to cross over the ocean of miseries. http://www.atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810
On 2014-12-01 19:46, Gour wrote:
On Mon, 01 Dec 2014 18:11:58 +0100 "Carlos E. R." <robin.listas@telefonica.net> wrote:
If I go for the hardware, I intend to digitize my entire collection and that of my parents. A few hundreds, I expect.
Mine is few thousands and every scan was huge in raw format.
I'd be very happy with plain png (lossless compression). Raw would be a nicety, a "plus" :-) I have not looked at Canon, though, because they are not Linux friendly. But if it must be... -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 1 December 2014 at 10:14, Vojtěch Zeisek <vojtech.zeisek@opensuse.org> wrote:
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy. Any other suggestions? ;-) Yours, Vojtěch
I looked into this a while back: https://plus.google.com/104051197821989601827/posts/82G9dMaSvGs Try find gscan2pdf or OCRFeeder, which are supposed to do it. John. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Po 1. prosince 2014 14:04:24, John Layt napsal(a):
On 1 December 2014 at 10:14, Vojtěch Zeisek wrote:
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable. And it is not the cheapest software... Tragedy. Any other suggestions? ;-) Yours, Vojtěch
I looked into this a while back:
https://plus.google.com/104051197821989601827/posts/82G9dMaSvGs
Try find gscan2pdf or OCRFeeder, which are supposed to do it.
Thank You! gscan2pdf looks fine, it's working. Well, the result is far from being perfect, but at least something... All the best, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-12-01 11:14, Vojtěch Zeisek wrote:
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable.
You could setup a virtualized guest with an older openSUSE that has the required libraries.
And it is not the cheapest software... Tragedy. Any other suggestions? ;-)
If you ask for ideas... ;-) Personally, I consider PDF a very bad format for scanned documents; I prefer "dejavu", which is designed for that very purpose. It is, however, not popular. There is open software to create the files, and text can be added although I've never tried. However, the available opensource is, let's say, fully functional but clumsy. There is proprietary software that is, they claim, much easier to use. However, OS can be easily scripted... some samples: djvusmooth - Graphical Text Editor for DjVu pdf2djvu - PDF to DjVu Converter djvu2pdf - Converting Djvu Files to PDF Files djvulibre-doc - Documentation for the the DjVu - djvulibre djvulibre-djview4 - Portable DjVu Qt4 Based Viewer and Browser Plugin djvutxt - Extract the hidden text from DjVu documents. djvused - Multi-purpose DjVu document editor. djvulibre - An Open Source Implementation of DjVu DjVu is a Web-centric format and software platform for distributing documents and images. DjVuLibre is an open source (GPL) implementation of DjVu, including viewers, browser plug-ins, decoders, simple encoders, and utilities. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consumes less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy rerendering. DjVu is used by hundreds of academic, commercial, governmental, and noncommercial Web sites around the world. DjVuDocument DjVuDocument is a compression technique specifically designed for color digital documents images containing both pictures and text, such as a page of a magazine. DjVuDocument represents images into separately compressed layers. The foreground layer is usually compressed with DjVu Bitonal and contains the text and drawings. The background layer is usually compressed with DjVuPhoto and contains the background texture and the pictures at lower resolution. - -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEUEARECAAYFAlR8f6QACgkQtTMYHG2NR9XStQCXS8hJFuqh/69IB8ocQqRMiV7R NACdERKRfPF2Q2tYQBLCxGfgN0fGvyc= =KlHa -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Po 1. prosince 2014 15:48:17, Carlos E. R. napsal(a):
On 2014-12-01 11:14, Vojtěch Zeisek wrote:
Hi, is there any working tool which is able to add text layer into scanned PDF? I tried YAGF (front-end for cuneiform and/or tesseract), but it seems to have only option to save the text as separate TXT file. Cuneiform also doesn't have this possibility and tesseract I wasn't able to get to work (script OCRmyPDF was always complaining about missing tesseract even it was installed). Scantailor seems to lack this functionality. Ocrad wasn't able to start (and no error message produced) and gocr isn't able to work with PDF... Some old demo version of Vuescan I have requires libgtk-X11 which is unavailable.
You could setup a virtualized guest with an older openSUSE that has the required libraries.
And it is not the cheapest software... Tragedy. Any other suggestions? ;-)
If you ask for ideas... ;-)
Thanks :-P
Personally, I consider PDF a very bad format for scanned documents; I prefer "dejavu", which is designed for that very purpose. It is, however, not popular. There is open software to create the files, and text can be added although I've never tried. However, the available opensource is, let's say, fully functional but clumsy. There is proprietary software that is, they claim, much easier to use.
I'm not author of those PDFs. I use scanned old books from sources like http://biodiversitylibrary.org/ or http://bibdigital.rjb.csic.es/ or even Google Groups or so. Often they already passed through OCR. Sometimes not. And then it is very very useful...
However, OS can be easily scripted...
Yes, convert all my thousands PDFs into dejavu and go on... ;-)
some samples:
djvusmooth - Graphical Text Editor for DjVu pdf2djvu - PDF to DjVu Converter djvu2pdf - Converting Djvu Files to PDF Files djvulibre-doc - Documentation for the the DjVu - djvulibre djvulibre-djview4 - Portable DjVu Qt4 Based Viewer and Browser Plugin
djvutxt - Extract the hidden text from DjVu documents. djvused - Multi-purpose DjVu document editor.
djvulibre - An Open Source Implementation of DjVu
DjVu is a Web-centric format and software platform for distributing documents and images. DjVuLibre is an open source (GPL) implementation of DjVu, including viewers, browser plug-ins, decoders, simple encoders, and utilities. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consumes less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy rerendering. DjVu is used by hundreds of academic, commercial, governmental, and noncommercial Web sites around the world.
DjVuDocument
DjVuDocument is a compression technique specifically designed for color digital documents images containing both pictures and text, such as a page of a magazine. DjVuDocument represents images into separately compressed layers. The foreground layer is usually compressed with DjVu Bitonal and contains the text and drawings. The background layer is usually compressed with DjVuPhoto and contains the background texture and the pictures at lower resolution. -- Vojtěch Zeisek
Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On 2014-12-01 16:04, Vojtěch Zeisek wrote:
Dne Po 1. prosince 2014 15:48:17, Carlos E. R. napsal(a):
And it is not the cheapest software... Tragedy. Any other suggestions? ;-)
If you ask for ideas... ;-)
Thanks :-P
O:-)
Personally, I consider PDF a very bad format for scanned documents; I prefer "dejavu", which is designed for that very purpose. It is, however, not popular. There is open software to create the files, and text can be added although I've never tried. However, the available opensource is, let's say, fully functional but clumsy. There is proprietary software that is, they claim, much easier to use.
I'm not author of those PDFs. I use scanned old books from sources like http://biodiversitylibrary.org/ or http://bibdigital.rjb.csic.es/ or even Google Groups or so. Often they already passed through OCR. Sometimes not. And then it is very very useful...
Yep. Some libraries also provide djvu files, a few only djvu. Most png/jpg, or pdf. Some use flash!
However, OS can be easily scripted...
Yes, convert all my thousands PDFs into dejavu and go on... ;-)
LOL. It could take weeks of cpu. Djvu creation is cpu intensive, display is very fast. I would probably do it, though :-p -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
Am Montag, 1. Dezember 2014, 11:14:25 schrieb Vojtěch Zeisek:
Hi, is there any working tool which is able to add text layer into scanned PDF? [...]
So you want to create sandwich pdfs? There is a tool called, surprise, pdfsandwich that "is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract." http://www.tobias-elze.de/pdfsandwich/ However, I did not find any OBS repository with a recent and working package. :-/ Gruß Jan -- If you're not confused yet, you haven't been paying attention. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon 01 Dec 2014 07:05:37 PM CST, Jan Ritzerfeld wrote:
Am Montag, 1. Dezember 2014, 11:14:25 schrieb Vojtěch Zeisek:
Hi, is there any working tool which is able to add text layer into scanned PDF? [...]
So you want to create sandwich pdfs? There is a tool called, surprise, pdfsandwich that "is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract." http://www.tobias-elze.de/pdfsandwich/ However, I did not find any OBS repository with a recent and working package. :-/
Gruß Jan Hi Here you go; https://build.opensuse.org/package/show/home:malcolmlewis:TESTING/pdfsandwic...
-- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.28-4-default up 4 days 15:32, 5 users, load average: 0.21, 0.26, 0.24 CPU Intel® B840@1.9GHz | GPU Intel® Sandybridge Mobile -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Po 1. prosince 2014 12:42:50, Malcolm napsal(a):
On Mon 01 Dec 2014 07:05:37 PM CST, Jan Ritzerfeld wrote:
Am Montag, 1. Dezember 2014, 11:14:25 schrieb Vojtěch Zeisek:
Hi, is there any working tool which is able to add text layer into scanned PDF? [...]
So you want to create sandwich pdfs? There is a tool called, surprise,
Yes
pdfsandwich that "is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract." http://www.tobias-elze.de/pdfsandwich/ However, I did not find any OBS repository with a recent and working package. :-/
It looks good.
Hi Here you go; https://build.opensuse.org/package/show/home:malcolmlewis:TESTING/pdfsandwic h
Nice! :-) Thank You! I'll test it tomorrow. Yours, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
Dne Po 1. prosince 2014 12:42:50, Malcolm napsal(a):
On Mon 01 Dec 2014 07:05:37 PM CST, Jan Ritzerfeld wrote:
Am Montag, 1. Dezember 2014, 11:14:25 schrieb Vojtěch Zeisek:
Hi, is there any working tool which is able to add text layer into scanned PDF? [...]
So you want to create sandwich pdfs? There is a tool called, surprise, pdfsandwich that "is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract."
Hi Here you go; https://build.opensuse.org/package/show/home:malcolmlewis:TESTING/pdfsandwic h
OK, so I installed 13.1 version into 13.2, but it shouldn't matter. I also installed tesseract-traineddata-english. But when I try to convert PDF: pdfsandwich some.pdf I get an error pdfsandwich version 0.1.3 Fatal error: exception Failure("Language eng not supported by tesseract. Make sure that the respective tesseract language package is installed.“) But it is installed. And I don't know why, but when I launch the command, tesseract (Some demo of shooting game! WTF?) is launched and I have to quit it... So I think I'll stay with gscan2pdf as it produces acceptable output. Although no scripting... But thank You for building it. Someone else can find it useful. :-) Yours, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On Tue 02 Dec 2014 09:50:12 AM CST, Vojtěch Zeisek wrote:
Dne Po 1. prosince 2014 12:42:50, Malcolm napsal(a):
On Mon 01 Dec 2014 07:05:37 PM CST, Jan Ritzerfeld wrote:
Am Montag, 1. Dezember 2014, 11:14:25 schrieb Vojtěch Zeisek:
Hi, is there any working tool which is able to add text layer into scanned PDF? [...]
So you want to create sandwich pdfs? There is a tool called, surprise, pdfsandwich that "is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract."
Hi Here you go; https://build.opensuse.org/package/show/home:malcolmlewis:TESTING/pdfsandwic h
OK, so I installed 13.1 version into 13.2, but it shouldn't matter. I also installed tesseract-traineddata-english. But when I try to convert PDF: pdfsandwich some.pdf I get an error pdfsandwich version 0.1.3 Fatal error: exception Failure("Language eng not supported by tesseract. Make sure that the respective tesseract language package is installed.“) But it is installed. And I don't know why, but when I launch the command, tesseract (Some demo of shooting game! WTF?) is launched and I have to quit it... So I think I'll stay with gscan2pdf as it produces acceptable output. Although no scripting... But thank You for building it. Someone else can find it useful. :-) Yours, Vojtěch
Hi Sounds like you installed this? https://build.opensuse.org/package/show?project=games&package=tesseract Do you have the games repo enabled? -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.28-4-default up 5 days 9:11, 3 users, load average: 0.09, 0.10, 0.12 CPU Intel® B840@1.9GHz | GPU Intel® Sandybridge Mobile -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Út 2. prosince 2014 06:20:31, Malcolm napsal(a):
On Tue 02 Dec 2014 09:50:12 AM CST, Vojtěch Zeisek wrote:
Dne Po 1. prosince 2014 12:42:50, Malcolm napsal(a):
On Mon 01 Dec 2014 07:05:37 PM CST, Jan Ritzerfeld wrote:
Am Montag, 1. Dezember 2014, 11:14:25 schrieb Vojtěch Zeisek:
Hi, is there any working tool which is able to add text layer into scanned PDF? [...]
So you want to create sandwich pdfs? There is a tool called, surprise, pdfsandwich that "is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract."
Hi Here you go; https://build.opensuse.org/package/show/home:malcolmlewis:TESTING/pdfsand wic h
OK, so I installed 13.1 version into 13.2, but it shouldn't matter. I also installed tesseract-traineddata-english. But when I try to convert PDF: pdfsandwich some.pdf I get an error pdfsandwich version 0.1.3 Fatal error: exception Failure("Language eng not supported by tesseract. Make sure that the respective tesseract language package is installed.“) But it is installed. And I don't know why, but when I launch the command, tesseract (Some demo of shooting game! WTF?) is launched and I have to quit it... So I think I'll stay with gscan2pdf as it produces acceptable output. Although no scripting... But thank You for building it. Someone else can find it useful. :-) Yours, Vojtěch
Hi Sounds like you installed this? https://build.opensuse.org/package/show?project=games&package=tesseract
Do you have the games repo enabled?
Yes, that is it. -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On Tue 02 Dec 2014 02:10:43 PM CST, Vojtěch Zeisek wrote: <snip>
Hi Sounds like you installed this? https://build.opensuse.org/package/show?project=games&package=tesseract
Do you have the games repo enabled?
Yes, that is it.
Hi You need to install tesseract from the OSS repository, either via YaST of something like; zypper in -f tesseract --from repo-oss -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.28-4-default up 5 days 10:31, 3 users, load average: 0.41, 0.14, 0.13 CPU Intel® B840@1.9GHz | GPU Intel® Sandybridge Mobile -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Út 2. prosince 2014 07:42:55, Malcolm napsal(a):
On Tue 02 Dec 2014 02:10:43 PM CST, Vojtěch Zeisek wrote:
Sounds like you installed this? https://build.opensuse.org/package/show?project=games&package=tesseract
Do you have the games repo enabled?
Yes, that is it.
Hi You need to install tesseract from the OSS repository, either via YaST of something like;
zypper in -f tesseract --from repo-oss
Ah, OK. Well, still it produces empty output. It is apparently missing identify command. So I'll stay with gscan2pdf as it works and I don't want to spend with it ages... Thank You for Your help, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
Dne Út 2. prosince 2014 16:37:21, Carlos E. R. napsal(a):
On 2014-12-02 16:22, Vojtěch Zeisek wrote:
It is apparently missing identify command.
cer@Telcontar:~> wrpm identify ImageMagick-6.8.6.9-2.16.1.x86_64 cer@Telcontar:~>
Argh, I mislooked to the output. Identify is present, but hocr2pdf is missing... Sorry... -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
On Tue 02 Dec 2014 05:43:53 PM CST, Vojtěch Zeisek wrote:
Dne Út 2. prosince 2014 16:37:21, Carlos E. R. napsal(a):
On 2014-12-02 16:22, Vojtěch Zeisek wrote:
It is apparently missing identify command.
cer@Telcontar:~> wrpm identify ImageMagick-6.8.6.9-2.16.1.x86_64 cer@Telcontar:~>
Argh, I mislooked to the output. Identify is present, but hocr2pdf is missing... Sorry...
Hi Available here; https://build.opensuse.org/package/show?project=home%3ALazy_Kent&package=exact-image -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.28-4-default up 5 days 14:42, 3 users, load average: 0.22, 0.22, 0.18 CPU Intel® B840@1.9GHz | GPU Intel® Sandybridge Mobile -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (9)
-
Carlos E. R.
-
Carlos E. R.
-
Gour
-
Greg Freemyer
-
Jan Ritzerfeld
-
John Andersen
-
John Layt
-
Malcolm
-
Vojtěch Zeisek