Re: [SLE] PDF to TXT (ascii) ... helppppppp

28 Aug 2005

      Pelibali,

On Sunday 28 August 2005 03:19, pelibali wrote:
...
Hi,
On Sun, 28 Aug 2005 00:22:46 -0400
Maura Edelweiss Monville <.> wrote:
...
I need to transform PDF to text (ascii)  ...
How can I do that ?
Just a remark. We have plenty of pdfs, which could be converted to
plain text exclusively through opt. character recognition (~OCR)!
First we also tried to get out the text only, but the trick is, that
all of the pages in these pdfs are inserted as tiff files!
Only human eyes recognize the bla-bla as _text_, for a computer they
stay only _images_...
ACM digitized its library this way, but they included the OCR-ed text 
_and_ the scanned page images in the PDF files they distribute. When 
you read or print the document, you see the scanned page images. When 
you copy or search, the OCR-ed text is used. The OCR is predictably 
flawed, but the scheme is about the best you can hope for with fully 
automated digitization of a very large library.
...
Pelibali
Randall Schulz

Re: [SLE] PDF to TXT (ascii) ... helppppppp

Randall R Schulz