On 06-Oct-05 Charles philip Chan wrote:
On 4 Oct 2005, dmcgarrett@optonline.net wrote:
Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files?
Xsane and Kooka both support external command line OCR programs. Two common ones are ocrad and gocr- two are both include with SUSE.
Charles
Thanks for these pointers -- I hadn't met ocrad before.
I just compiled the atest ocrad and gocr from source.
It looks as though ocrad may be slightly the better of the two,
in two respects.
1. It's more compact (compiles to 300K as opposed to 535K for gocr)
and is quicker.
2. It seems to do marginally better recognition.
I fed ocrad an A4 page of double-spaced 11-point Times Roman, scanned
at 300dpi to a BPM file. There were errors in 6 places out of 2400
characters, so 1 in 400.
Gocr 0.4 scored 12 errors on the same file.
Also, orcrad picked up ligatures (e.g. "fi" ) correctly, while gocr
failed on these.
Interestingly, an old version of gocr (0.31 from 2001 which I already
had) also scored 6 errors like ocrad on the same file (but still failed
on the ligatures).
A 1/400 error rate isn't bad. You're going to have to check any OCR
results anyway. Running the output past ispell is a quick way of
cleaning up.
On the other hand, maybe gocr offers more flexibility.
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)