Re: [opensuse] PDF OCR

13 Dec 2007

      Am Thursday 13 December 2007 schrieb StephenW:
...
--- Roger Oberholtzer  wrote:
...
Hello
We have a network printer that will scan docs and send them as pdf docs
to an e-mail address in the company. Is there any software with OpenSUSE
10.3 that can do OCR from a PDF doc? I am guessing that the doc contains
tiff images of the scanned documents. Any and all pointers are welcome.
I had to do much the same in the past - a quick bash script seemed like the 
best way to solve it:

1. use pdf2ppm to extract the images from the pdf to a new directory
2. use ppm2tiff on all the extracted ppm files
3. use tesseract or whatever its called these days on the tiff files
4. append the text files to a single text file (or leave them separate, 
whatever)

There's probably a much more sensible way of doing this :-) but this worked 
consistently for me for quite a number of documents scanned and sent as pdf.

Ciaran

-- 
SUSE LINUX Products GmbH
GF: Markus Rex
HRB 16746 (AG Nuremberg)
Maxfeldstrasse 5
90409, Nuremberg
Tel: +49 911 74053 262

Re: [opensuse] PDF OCR

Ciaran Farrell