Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files? (I never could get the scanner to work in older versions of SuSE.) It works fine in various Windows versions, but I don't have a really good OCR for Windows. And I went out of my way to buy a scsi scanner, so it might work on Linux. No such luck. I will shortly (I hope) have SuSE 10.0. --doug -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.344 / Virus Database: 267.11.10/119 - Release Date: 10/4/2005
On 05/10/05, Doug McGarrett
Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files?
(I never could get the scanner to work in older versions of SuSE.) It works fine in various Windows versions, but I don't have a really good OCR for Windows. And I went out of my way to buy a scsi scanner, so it might work on Linux. No such luck.
I will shortly (I hope) have SuSE 10.0.
--doug
I may be wrong but surely it would be the version SANE that you are using not the version of SuSE as to whether your scanner would work. Or am I wrong? -- ============================================== I am only human, please forgive me if I make a mistake it is not deliberate. ============================================== Take care. Kevan Farmer 34 Hill Street Cheslyn Hay Staffordshire WS6 7HR
On Wed, 5 Oct 2005 09:36:40 +0100
Kevanf1
On 05/10/05, Doug McGarrett
wrote: Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files?
(I never could get the scanner to work in older versions of SuSE.) It works fine in various Windows versions, but I don't have a really good OCR for Windows. And I went out of my way to buy a scsi scanner, so it might work on Linux. No such luck.
I will shortly (I hope) have SuSE 10.0.
--doug
I may be wrong but surely it would be the version SANE that you are using not the version of SuSE as to whether your scanner would work. Or am I wrong?
I've had two different SCSI scanners working with SANE on my old RH 9.0 box. This was a really long time ago, so I don't remember how I did it, but I do remember that it took about five minutes-- literally. Try typing "sane" at a prompt. hth, ken
-- ============================================== I am only human, please forgive me if I make a mistake it is not deliberate. ============================================== Take care. Kevan Farmer
34 Hill Street Cheslyn Hay Staffordshire WS6 7HR
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
-- A lot of us are working harder than we want, at things we don't like to do. Why? ...In order to afford the sort of existence we don't care to live. -- Bradford Angier
On 4 Oct 2005, dmcgarrett@optonline.net wrote:
Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files?
Xsane and Kooka both support external command line OCR programs. Two common ones are ocrad and gocr- two are both include with SUSE. Charles -- ..you could spend *all day* customizing the title bar. Believe me. I speak from experience." (By Matt Welsh)
On 06-Oct-05 Charles philip Chan wrote:
On 4 Oct 2005, dmcgarrett@optonline.net wrote:
Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files?
Xsane and Kooka both support external command line OCR programs. Two common ones are ocrad and gocr- two are both include with SUSE.
Charles
Thanks for these pointers -- I hadn't met ocrad before.
I just compiled the atest ocrad and gocr from source.
It looks as though ocrad may be slightly the better of the two,
in two respects.
1. It's more compact (compiles to 300K as opposed to 535K for gocr)
and is quicker.
2. It seems to do marginally better recognition.
I fed ocrad an A4 page of double-spaced 11-point Times Roman, scanned
at 300dpi to a BPM file. There were errors in 6 places out of 2400
characters, so 1 in 400.
Gocr 0.4 scored 12 errors on the same file.
Also, orcrad picked up ligatures (e.g. "fi" ) correctly, while gocr
failed on these.
Interestingly, an old version of gocr (0.31 from 2001 which I already
had) also scored 6 errors like ocrad on the same file (but still failed
on the ligatures).
A 1/400 error rate isn't bad. You're going to have to check any OCR
results anyway. Running the output past ispell is a quick way of
cleaning up.
On the other hand, maybe gocr offers more flexibility.
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)
How do I make ocrad work from xsane? I find when I try to configure this that ocrad complains: bad magic number - not a pbm file. and I can't see how to get xsane to output pbm (what is that??) or find a tool that I can use to create a pipe to do the conversion. Any suggestions? Cheers, Simon --- Ted.Harding@nessie.mcc.ac.uk wrote:
On 06-Oct-05 Charles philip Chan wrote:
On 4 Oct 2005, dmcgarrett@optonline.net wrote:
Is there a really good OCR for SuSE? What's it called and how do you get it and run it? Will it work for a UMAX 2200 scsi scanner? Will it read and/or output .pdf files?
Xsane and Kooka both support external command line OCR programs. Two common ones are ocrad and gocr- two are both include with SUSE.
Charles
Thanks for these pointers -- I hadn't met ocrad before.
I just compiled the atest ocrad and gocr from source.
It looks as though ocrad may be slightly the better of the two, in two respects.
1. It's more compact (compiles to 300K as opposed to 535K for gocr) and is quicker.
2. It seems to do marginally better recognition.
I fed ocrad an A4 page of double-spaced 11-point Times Roman, scanned at 300dpi to a BPM file. There were errors in 6 places out of 2400 characters, so 1 in 400.
Gocr 0.4 scored 12 errors on the same file.
Also, orcrad picked up ligatures (e.g. "fi" ) correctly, while gocr failed on these.
Interestingly, an old version of gocr (0.31 from 2001 which I already had) also scored 6 errors like ocrad on the same file (but still failed on the ligatures).
A 1/400 error rate isn't bad. You're going to have to check any OCR results anyway. Running the output past ispell is a quick way of cleaning up.
On the other hand, maybe gocr offers more flexibility.
Best wishes to all, Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding)
Fax-to-email: +44 (0)870 094 0861 Date: 06-Oct-05 Time: 20:19:44 ------------------------------ XFMail ------------------------------ -- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
On 6 Oct 2005, thorpflyer@yahoo.com wrote:
How do I make ocrad work from xsane? I find when I try to configure this that ocrad complains:
bad magic number - not a pbm file.
Scan in lineart mode. Here is some info about pbm: http://netpbm.sourceforge.net/doc/pbm.html Charles -- die_if_kernel("Penguin instruction from Penguin mode??!?!", regs); linux-2.2.16/arch/sparc/kernel/traps.c
Strange. Lineart mode, of itself, doesn't fix the behavior. However, if
I scan in lineart mode, then save the file using the pbm extension and
"determine file type by extension" then I can use ocrad on the
resulting file manually. Good enough, I don't do this often, but I'm a
little curious why I can't configure the menu to do it directly.
Anyway, I'm happy, thanks for the help :)
Cheers,
Simon
--- Charles philip Chan
On 6 Oct 2005, thorpflyer@yahoo.com wrote:
How do I make ocrad work from xsane? I find when I try to configure this that ocrad complains:
bad magic number - not a pbm file.
Scan in lineart mode. Here is some info about pbm:
http://netpbm.sourceforge.net/doc/pbm.html
Charles
-- die_if_kernel("Penguin instruction from Penguin mode??!?!", regs); linux-2.2.16/arch/sparc/kernel/traps.c
"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." Naguib Mahfouz __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
On 7 Oct 2005, thorpflyer@yahoo.com wrote:
Strange. Lineart mode, of itself, doesn't fix the behavior. However, if I scan in lineart mode, then save the file using the pbm extension and "determine file type by extension" then I can use ocrad on the resulting file manually. Good enough, I don't do this often, but I'm a little curious why I can't configure the menu to do it directly.
I usually just scan it in "Viewer" mode in XSane and invoke ocr from the "File" menu of the viewer. Charles -- The nice thing about Windows is - It does not just crash, it displays a dialog box and lets you press 'OK' first. (Arno Schaefer's .sig)
On 6 Oct 2005, Ted.Harding@nessie.mcc.ac.uk wrote:
2. It seems to do marginally better recognition.
If you you really want accuracy, this one is even better: http://freshmeat.net/projects/claraocr/ However, you need to train it.
Also, orcrad picked up ligatures (e.g. "fi" ) correctly, while gocr failed on these.
Interesting.
On the other hand, maybe gocr offers more flexibility.
Yes, gocr is a lot more flexible. Charles -- /* Host controller interrupts must not be running while calling this * function or the penguins will get angry. */ linux-2.2.16/drivers/usb/ohci.c
On Thu, Oct 06, 2005 at 06:58:57PM -0400, Charles philip Chan wrote: [...]
If you you really want accuracy, this one is even better:
http://freshmeat.net/projects/claraocr/
However, you need to train it.
I can second that - especially on bad scans (I had to deal e.g. with scans from typewriter on grey paper), ClaraOCR produced much better results than the other two. Unfortunately, it is quite unstable (so save often!) and development seemingly has all but stalled for the moment. Cheerio, Thomas
I found this interesting looking program: ,----[ Unpaper ] | unpaper is a post-processing tool for scanned sheets of paper, | especially for book pages that have been scanned from previously | created photocopies. It can make scanned pages more readable on a | screen and more acceptable for OCR. unpaper removes dark edges from | the image and also tries to unskew or rotate pages to make the text | horizontal. `---- You can find it at: http://unpaper.berlios.de/ Charles -- We are Pentium of Borg. Division is futile. You will be approximated. (seen in someone's .signature)
participants (7)
-
Charles philip Chan
-
Doug McGarrett
-
ken
-
Kevanf1
-
Simon Roberts
-
T. Ribbrock
-
Ted.Harding@nessie.mcc.ac.uk