On 19-Nov-02 Carlos E. R. wrote:
What can I use for OCR?
I have tried gocr, which seems usable.
Also clara, which seems horrible to me: only accepts files in pbm format, and so far produced nothing: It's been working on another window for 15 minutes or more, 90% CPU of a pentium IV, and no text output so far. Aparently, I have got to "train it", but it took 5 minutes to learn the letter "A", having to wait for it to reddraw after each click for minutes. There is something very wrong there...
I hate to say this, but a friend demonstrated to me his Ms-word scanning a page, and it almost even got the format (typesetting) correct!
I can't believe clara or gocr is the best I can find in linux, there must be something better :-?
-- Cheers, Carlos Robinson
OCR has been the Cinderella application of the Linux world. Until
relatively recently there was nothing at all which worked well
enough for use in the real world. Xocr was hopeless. Gocr, as
Carlos says, is (sort of) usable, but lacks (or lacked, when I
tried it) the sort of sophistication needed for real use. I never
tried clara.
And, as Carlos says, MS Windows OCR applications really do work.
Even the OCR software given away on the CD that comes with a
scanner works well -- for some time I ran one of these on Linux
using WABI, and got very good results (the software dated from
about 1995, and came on a floppy ... ).
Now, however, there is at least one good commercial OCR package
available in a Linux version: OCRshop. See
http://www.vividata.com
In some respects this has (or had, in the oldish version I have
used) a few rough edges. Nevertheless, it is powerful, capable and
reliable, and can produce output in a variety of formats.
By the way, a file format which lends itself very well to OCR is
the TIFF format used in faxes. For some reason, character recognition
in this format is particularly reliable.
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)