OCR problems

newer
Re: [SLE] PCMCIA eth problems -...

steve-ss

23 Feb 2004 23 Feb '04

17:39

Hi. I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone? Thanks, Steve.

Show replies by date

Martin Mielke

23 Feb 23 Feb

17:15

New subject: [SLE] OCR problems

Hi, same problem here... A text like, for example: -- I'm trying to ocr some old typewritten ... -- would turn into something like: --- -_- -__| ocr s0me old _|||_ ... --- Regards, Martin

...

Hi. I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone?

Thanks, Steve.

steve-ss

20:01

New subject: [SLE] OCR problems

It seems to be with the fuzziness of old typed texts where the ink has bled. A printed sheet in times new roman from e.g. my local newspaper ocr's almost perfectly under kooka. My old text actually looks better to me in the kooka preview than the modern text. (BTW kooka seems to work better if your scanner is switched on. . .sorry about ranting in orig. message!). The nearest I've got so far is around 50% correct by using gimp's sharpening tool. Takes around 10 minutes per page. We *must* be able to do better than this. Cheers, Steve. On Monday 23 February 2004 17:15, Martin Mielke wrote:

...

Hi,

same problem here...

A text like, for example:

-- I'm trying to ocr some old typewritten ... --

would turn into something like:

--- -_- -__| ocr s0me old _|||_ ... ---

Regards, Martin

...
Hi. I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone?

Thanks, Steve.

Carlos E. R.

24 Feb 24 Feb

00:49

New subject: [SLE] OCR problems

The Monday 2004-02-23 at 17:39 -0000, steve-ss wrote:

...

I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone?

The best results I got with kooka (from books or magazines) - but they are very bad compared to what I can get in windows with the software that came bundled with my epson scanner. I have to confess that the windows software makes almost no errors... Most unfortunate. -- Cheers, Carlos Robinson

Charles Philip Chan

02:44

New subject: [SLE] OCR problems

On Tue, 24 Feb 2004 01:49:14 +0100 (CET) "Carlos E. R." wrote:

...

The best results I got with kooka (from books or magazines) - but they are very bad compared to what I can get in windows with the software that came bundled with my epson scanner.

Yes, OCR software under Linux is in no way as matured as the OCR software in Windows. However, I presume you are still using gocr (no longer in development), you can try using ocrad instead. Charles -- "Are [Linux users] lemmings collectively jumping off of the cliff of reliable, well-engineered commercial software?" (By Matt Welsh)

Silviu Marin-Caea

08:51

New subject: [SLE] OCR problems

Charles Philip Chan wrote:

...

On Tue, 24 Feb 2004 01:49:14 +0100 (CET) "Carlos E. R." wrote:

...
The best results I got with kooka (from books or magazines) - but they are very bad compared to what I can get in windows with the software that came bundled with my epson scanner.

Yes, OCR software under Linux is in no way as matured as the OCR software in Windows. However, I presume you are still using gocr (no longer in development), you can try using ocrad instead.

Software under Windows with good results usually uses a dictionary to correct incompletely recognized words. Any OCR without this, can't really work well enough. Does ocrad have it? And of course, you'd need a dictionary for the language of the scanned text.

Charles Philip Chan

10:53

New subject: [SLE] OCR problems

On Tue, 24 Feb 2004 10:51:55 +0200 Silviu Marin-Caea wrote:

...

Software under Windows with good results usually uses a dictionary to correct incompletely recognized words. Any OCR without this, can't really work well enough. Does ocrad have it?

Unfortunately no, but it does seem to do a better job than gocr. I can't do any extensive testing right now since I am in the middle of recompiling X and I also need to recompile SANE to use libusb because I just upgraded to kernel 2.6.3 and the usb scanner module have been taken out. I just checked the gocr homepage. I was wrong, it is still in development, but very very slowly. Any way if both gocr and ocrad doesn't work well for you and if you don't mind paying for it , kooka can also be recompiled to use the commercial KADMOS OCR engine (http://www.rerecognition.com/) instead (wow, just checked their prices- it is very expensive). I have never tried it so I don't know whether it uses dictionaries, omnifonts, etc. Charles -- "I once witnessed a long-winded, month-long flamewar over the use of mice vs. trackballs...It was very silly." (By Matt Welsh)

John Pettigrew

11:50

In a previous message, Charles Philip Chan wrote:

...

kooka can also be recompiled to use the commercial KADMOS OCR engine

The version of Kooka in 8.2 seems to use kadmos by default (although it can use gocr), and it seems to give good results (between crashes here). I've just been testing it and it gives very good results with either engine - even gocr isn't *too* bad using it and you can feed it a greyscale image and tweak the settings using kooka. But it does seem a little unstable - save frequently :-) John -- John Pettigrew Headstrong Games john@headstrong-games.co.uk Fun : Strategy : Price http://www.headstrong-games.co.uk/ Board games that won't break the bank Fields of Valour: 2 Norse clans battle on one of 3 different boards

Charles Philip Chan

12:01

New subject: [SLE] Re: OCR problems

On Tue, 24 Feb 2004 11:50:10 +0000 John Pettigrew wrote:

...

The version of Kooka in 8.2 seems to use kadmos by default (although it can use gocr),

If you upgrade to the version included in KDE 3.2, you can also use ocrad- http://www.gnu.org/software/ocrad/ocrad.html. Charles -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) (Linus Torvalds)

Carlos E. R.

25 Feb 25 Feb

00:26

New subject: [SLE] Re: OCR problems

The Tuesday 2004-02-24 at 11:50 -0000, John Pettigrew wrote:

...

...
kooka can also be recompiled to use the commercial KADMOS OCR engine

The version of Kooka in 8.2 seems to use kadmos by default (although it can use gocr), and it seems to give good results (between crashes here). I've just

I forgot to mention it - yes, the "good" results I got with kooka were with kadmos, not gocr. I tried the same page on both; I think I reported the results of my test here, but I might be mistaken. And yes, it was quite unstable. One of the tests used a huge ammount of memory, almost crashing the whole system. -- Cheers, Carlos Robinson

Carlos E. R.

00:29

New subject: [SLE] OCR problems

The Monday 2004-02-23 at 21:44 -0500, Charles Philip Chan wrote:

...

Yes, OCR software under Linux is in no way as matured as the OCR software in Windows. However, I presume you are still using gocr

Actually, no: I confess I boot in windows on those cases. :-(

...

(no longer in development), you can try using ocrad instead.

Ah, didn't knew about it... it is not included in suse 8.2 - do you know which is its home page? I'll try to compile it. -- Cheers, Carlos Robinson

Charles Philip Chan

02:12

New subject: [SLE] OCR problems

On Wed, 25 Feb 2004 01:29:31 +0100 (CET) "Carlos E. R." wrote:

...

Ah, didn't knew about it...

Like I said in one of my earlier posts, I found out that gocr is still in development, but it is so slow that it looks like development have stopped. The latest version is 0.3.9.

...

it is not included in suse 8.2 - do you know which is its home page? I'll try to compile it.

The ocrad homepage is here: http://www.gnu.org/software/ocrad/ocrad.html and here is the tarball: http://ftp.gnu.org/gnu/ocrad/ocrad-0.7.tar.bz2 The latest version of kooka have support for it. Charles -- "The IETF motto is 'rough consensus and running code'" -- Scott Bradner (Open Sources, 1999 O'Reilly and Associates)

Örn Hansen

04:08

New subject: [SLE] OCR problems

onsdag 25 februari 2004 03:12 skrev Charles Philip Chan:

...

Like I said in one of my earlier posts, I found out that gocr is still in development, but it is so slow that it looks like development have stopped. The latest version is 0.3.9.

slow? --- gocr website --- February 13, 2004 GOCR 0.39 has been released. (try barcode!) December 22, 2003 GOCR 0.38 has been released. Please test it! October 14, 2003 GOCR compiled for Zeta by Bernd T. Korz (btk@yellowtab.com). Do you remember on BeOS? August 11, 2002 GOCR 0.37 has been released. Please test it! February 03, 2002 GOCR 0.3.5 has been released. Get it now! DOS/WIN-binary GOCR.EXE 0.3.5p1 available

Charles Philip Chan

04:49

New subject: [SLE] OCR problems

On Wed, 25 Feb 2004 05:08:54 +0100 Örn Hansen wrote:

...

slow?

Yes, slow- I keep track of a lot of projects and this is by far the slowest. As you can see from the website: February 13, 2004 GOCR 0.39 has been released. (try barcode!) ... February 03, 2002 GOCR 0.3.5 has been released. Get it now! DOS/WIN-binary GOCR.EXE 0.3.5p1 available a year later and we are still in the 0.3.x release cycle. Coupled with the fact that the project started in 1998, what would you call it? Speedy development? When the project first started I was really exited, but I have since moved on to ocrad. Charles -- "We all know Linux is great...it does infinite loops in 5 seconds." (Linus Torvalds about the superiority of Linux on the Amterdam Linux Symposium)

Örn Hansen

07:10

New subject: [SLE] OCR problems

onsdag 25 februari 2004 05:49 skrev Charles Philip Chan:

...

a year later and we are still in the 0.3.x release cycle. Coupled with the fact that the project started in 1998, what would you call it? Speedy development? When the project first started I was really exited, but I have since moved on to ocrad.

Ah, that tends to happen often with open source projects. They start up, but something blocks process ... sometimes its malicious intent of competitions, othertimes it's the poor belly that demands attention. life.

...

Charles

Charles Philip Chan

08:03

New subject: [SLE] OCR problems

On Wed, 25 Feb 2004 08:10:21 +0100 Örn Hansen wrote:

...

Ah, that tends to happen often with open source projects. They start up, but something blocks process ... sometimes its malicious intent of competitions, othertimes it's the poor belly that demands attention.

I understand. The project seems to be a 2 man show- if only they had opened up development, we would have seen 1.0 a long time ago. Charles -- "It's God. No, not Richard Stallman, or Linus Torvalds, but God." (By Matt Welsh)

Örn Hansen

14:43

New subject: [SLE] OCR problems

onsdag 25 februari 2004 09:03 skrev Charles Philip Chan:

...

I understand. The project seems to be a 2 man show- if only they had opened up development, we would have seen 1.0 a long time ago.

It appears to be a sourceforge project ... unfortunately, the open source environment isn't as glamorous as it appears to be. Getting people to help along, isn't that easy. I got two projects there, one was blocked by linuxthreats so the two couldn't coexist in the same program :-) the other, well it's not easy to get others to come along. Most people are looking for projects that'll pay, and if the project doesn't have an income source of some kind (direct or indirect) ... it's doomed, except as a hobby for one or two enthusiasts. It may be called Open whatever, but in the end its still about money.

...

Charles

Örn Hansen

04:05

New subject: [SLE] OCR problems

onsdag 25 februari 2004 01:29 skrev Carlos E. R.:

...

...
(no longer in development), you can try using ocrad instead.

Ah, didn't knew about it... it is not included in suse 8.2 - do you know which is its home page? I'll try to compile it.

No longer in development ... not in suse 8.2 ? What are you guys talking about? --- From gocr sourceforge site --- February 13, 2004 GOCR 0.39 has been released. (try barcode!) --- And I certainly do have gocr installed on my SuSE 9.0 box, as well as ocrad and both are supplied by the SuSE dvd.

7374

Age (days ago)

7376

Last active (days ago)

List overview

Download

17 comments

7 participants

participants (7)

Carlos E. R.
Charles Philip Chan
John Pettigrew
Martin Mielke
Silviu Marin-Caea
steve-ss
Örn Hansen

OCR problems

steve-ss

Martin Mielke

steve-ss

Carlos E. R.

Charles Philip Chan

Silviu Marin-Caea

Charles Philip Chan

John Pettigrew

Charles Philip Chan

Carlos E. R.

Carlos E. R.

Charles Philip Chan

Örn Hansen

Charles Philip Chan

Örn Hansen

Charles Philip Chan

Örn Hansen

Örn Hansen

tags

participants (7)