On Tue, 19 Nov 2002 01:14:06 +0100 (CET)
"Carlos E. R."
What can I use for OCR?
I have tried gocr, which seems usable.
I hate to say this, but a friend demonstrated to me his Ms-word scanning a page, and it almost even got the format (typesetting) correct!
I can't believe clara or gocr is the best I can find in linux, there must be something better :-?
(some ramblings) I think the luck your friend has with windows OCR is due to fonts. OCR software works good if it can match the fonts in it's libraries with the document fonts. It was probably a windows-made document so the windows-OCR worked well. You try to scan it on linux, and do OCR, and get bad results because linux dosn't know about the window's font that was used. Matching the document font to the fonts available to the OCR program is the key. So it's no surprise that windows OCR works better than linux OCR since most of the documents out there were made with windows. Try making a document with a linux font, and give it to your friend to OCR. As an related aside, I was listening to a report about Microsoft's new "tablet" computer, (I forget the name). It is a laptop, with a screen that also acts as a "writing tablet". You use a stylus to write messages in longhand instead of typing. The drawback is that these "messsages" are not "editable" because they are a graphic. The reporter said that Microsoft was working on an OCR program for handwriting, but I wouldn't count on them getting it working anytime soon. -- use Perl; #powerful programmable prestidigitation