The 02.11.22 at 08:50, John Pettigrew wrote:
Does gocr require a specific bit depth, or resolution/font size?
I don't know.
It's *really worth finding this out! Otherwise, you suffer badly from GIGO.
What's GIGO? :-? I think I will need an "English email acronym dictionary" O:-)
I tried to scan as B/W only and the result was dissapointing, too many "dots".
That's just a problem of the contrast setting when you scanned. You need to tweak the scanning controls until you get a clean scan. This will usually not change much for a given scanner (unless you have very yellow paper or a strong background colour) so saving the setting is worth it.
That's why I tried grays or color first, thinking that the ocr software would sort that out for itself. But the "clara" program wants B/W instead (.pbm format only).
if I scan at high quality a page, the software should be able to find that thresold on its own.
That's it - the higher quality the source image, the better the OCR software will do at character recognition. This is one area that consumer OCR applications have the advantage - someone's spent time putting serious image manipulation algorithms in there.
I'm afraid that must be the problem.
specified resolution/font size of the program.
That resolution should be clearly stated by the program!
Absolutely. I've not got gocr installed, so can't easily check, but if there is such a requirement, it should (as you say) be in the man page or other documentation.
I think it should be somewhere on the program menu itself.
FWIW, most general OCR apps seem to consider 12 pt text at 300dpi to be a good starting point.
Yes, that's the resolution I tried.
I think image files specify the resolution used (or the real size, from which resoltution can be calculated), so the program can give a warning if it is not appropiate.
Some image formats do contain resolution information, but for OCR it's not actually crucial. The important point is the relationship between physical size of the text (e.g. 12pt) and the resolution (e.g. 300dpi). That is, for a smaller text, you need to increase the resolution, and vice versa.
That's understandable :-) -- Cheers, Carlos Robinson