On Wednesday 20 November 2002 04:51 am, John Pettigrew wrote:
In a previous message, Carlos E. R. wrote:
And now, gocr in linux (I have only removed some empty lines to save space):
+++ (PICTURE) d_T j Uly, _' (lr Illy _l C_t-d Ilnl Vtr- sary c_nlumn, l urEURed comput-
[snip]
Hmmm. I've not tried OCR in Linux, but from my experience of programs on other platforms (no, not Windows :-) that looks like it's caused by the input bitmap being wrong in some way. Does gocr require a specific bit depth, or resolution/font size? If it was a greyscale image, was the contrast between the letters and background high enough? If 1-bit, was there any background noise?
The thing I've found is that the bitmap that you feed the OCR program needs to be as high quality as possible, and that it matches the specified resolution/font size of the program. I never auto-OCR because I often get better results by checking the bitmap before feeding it to the OCR engine, and it saves wasted time when there's something wrong.
John
-------------------------- Another good point is that when using OCR, scan in Binary mode, not greyscale. I have used OCR in Kooka with very good results, scanning in as Binary and the higher the resolution you use the greater the results it seems in character recognition. I know xsane uses gocr and I did some scans there last evening. The page looked good, although I am not experienced in xsane or gocr, so unsure if I was even doing OCR scans at the time. Kooka OCR has worked well for me with good results. It may also be using gocr, but I couldn't find anything that indicated that. Patrick --- KMail v1.4.3 --- SuSE Linux Pro v8.1 --- Registered Linux User #225206