Hi. I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone? Thanks, Steve.
Hi, same problem here... A text like, for example: -- I'm trying to ocr some old typewritten ... -- would turn into something like: --- -_- -__| ocr s0me old _|||_ ... --- Regards, Martin
Hi. I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone?
Thanks, Steve.
It seems to be with the fuzziness of old typed texts where the ink has bled. A printed sheet in times new roman from e.g. my local newspaper ocr's almost perfectly under kooka. My old text actually looks better to me in the kooka preview than the modern text. (BTW kooka seems to work better if your scanner is switched on. . .sorry about ranting in orig. message!). The nearest I've got so far is around 50% correct by using gimp's sharpening tool. Takes around 10 minutes per page. We *must* be able to do better than this. Cheers, Steve. On Monday 23 February 2004 17:15, Martin Mielke wrote:
Hi,
same problem here...
A text like, for example:
-- I'm trying to ocr some old typewritten ... --
would turn into something like:
--- -_- -__| ocr s0me old _|||_ ... ---
Regards, Martin
Hi. I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone?
Thanks, Steve.
The Monday 2004-02-23 at 17:39 -0000, steve-ss wrote:
I'm trying to ocr some old typewritten documents using xsane and gocr. There are many errors however. I've experimented with different brightness, gamma and contrast settings but can't seem to get acceptable quality. I've also tried kooka but it's won't scan nor preview the text and a good quality scan I adjusted from gimp still doesn't give good results. Any advice anyone?
The best results I got with kooka (from books or magazines) - but they are very bad compared to what I can get in windows with the software that came bundled with my epson scanner. I have to confess that the windows software makes almost no errors... Most unfortunate. -- Cheers, Carlos Robinson
On Tue, 24 Feb 2004 01:49:14 +0100 (CET)
"Carlos E. R."
The best results I got with kooka (from books or magazines) - but they are very bad compared to what I can get in windows with the software that came bundled with my epson scanner.
Yes, OCR software under Linux is in no way as matured as the OCR software in Windows. However, I presume you are still using gocr (no longer in development), you can try using ocrad instead. Charles -- "Are [Linux users] lemmings collectively jumping off of the cliff of reliable, well-engineered commercial software?" (By Matt Welsh)
Charles Philip Chan wrote:
On Tue, 24 Feb 2004 01:49:14 +0100 (CET) "Carlos E. R."
wrote: The best results I got with kooka (from books or magazines) - but they are very bad compared to what I can get in windows with the software that came bundled with my epson scanner.
Yes, OCR software under Linux is in no way as matured as the OCR software in Windows. However, I presume you are still using gocr (no longer in development), you can try using ocrad instead.
Software under Windows with good results usually uses a dictionary to correct incompletely recognized words. Any OCR without this, can't really work well enough. Does ocrad have it? And of course, you'd need a dictionary for the language of the scanned text.
On Tue, 24 Feb 2004 10:51:55 +0200
Silviu Marin-Caea
Software under Windows with good results usually uses a dictionary to correct incompletely recognized words. Any OCR without this, can't really work well enough. Does ocrad have it?
Unfortunately no, but it does seem to do a better job than gocr. I can't do any extensive testing right now since I am in the middle of recompiling X and I also need to recompile SANE to use libusb because I just upgraded to kernel 2.6.3 and the usb scanner module have been taken out. I just checked the gocr homepage. I was wrong, it is still in development, but very very slowly. Any way if both gocr and ocrad doesn't work well for you and if you don't mind paying for it , kooka can also be recompiled to use the commercial KADMOS OCR engine (http://www.rerecognition.com/) instead (wow, just checked their prices- it is very expensive). I have never tried it so I don't know whether it uses dictionaries, omnifonts, etc. Charles -- "I once witnessed a long-winded, month-long flamewar over the use of mice vs. trackballs...It was very silly." (By Matt Welsh)
In a previous message, Charles Philip Chan
kooka can also be recompiled to use the commercial KADMOS OCR engine
The version of Kooka in 8.2 seems to use kadmos by default (although it can use gocr), and it seems to give good results (between crashes here). I've just been testing it and it gives very good results with either engine - even gocr isn't *too* bad using it and you can feed it a greyscale image and tweak the settings using kooka. But it does seem a little unstable - save frequently :-) John -- John Pettigrew Headstrong Games john@headstrong-games.co.uk Fun : Strategy : Price http://www.headstrong-games.co.uk/ Board games that won't break the bank Fields of Valour: 2 Norse clans battle on one of 3 different boards
On Tue, 24 Feb 2004 11:50:10 +0000
John Pettigrew
The version of Kooka in 8.2 seems to use kadmos by default (although it can use gocr),
If you upgrade to the version included in KDE 3.2, you can also use ocrad- http://www.gnu.org/software/ocrad/ocrad.html. Charles -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) (Linus Torvalds)
The Tuesday 2004-02-24 at 11:50 -0000, John Pettigrew wrote:
kooka can also be recompiled to use the commercial KADMOS OCR engine
The version of Kooka in 8.2 seems to use kadmos by default (although it can use gocr), and it seems to give good results (between crashes here). I've just
I forgot to mention it - yes, the "good" results I got with kooka were with kadmos, not gocr. I tried the same page on both; I think I reported the results of my test here, but I might be mistaken. And yes, it was quite unstable. One of the tests used a huge ammount of memory, almost crashing the whole system. -- Cheers, Carlos Robinson
The Monday 2004-02-23 at 21:44 -0500, Charles Philip Chan wrote:
Yes, OCR software under Linux is in no way as matured as the OCR software in Windows. However, I presume you are still using gocr
Actually, no: I confess I boot in windows on those cases. :-(
(no longer in development), you can try using ocrad instead.
Ah, didn't knew about it... it is not included in suse 8.2 - do you know which is its home page? I'll try to compile it. -- Cheers, Carlos Robinson
On Wed, 25 Feb 2004 01:29:31 +0100 (CET)
"Carlos E. R."
Ah, didn't knew about it...
Like I said in one of my earlier posts, I found out that gocr is still in development, but it is so slow that it looks like development have stopped. The latest version is 0.3.9.
it is not included in suse 8.2 - do you know which is its home page? I'll try to compile it.
The ocrad homepage is here: http://www.gnu.org/software/ocrad/ocrad.html and here is the tarball: http://ftp.gnu.org/gnu/ocrad/ocrad-0.7.tar.bz2 The latest version of kooka have support for it. Charles -- "The IETF motto is 'rough consensus and running code'" -- Scott Bradner (Open Sources, 1999 O'Reilly and Associates)
onsdag 25 februari 2004 03:12 skrev Charles Philip Chan:
Like I said in one of my earlier posts, I found out that gocr is still in development, but it is so slow that it looks like development have stopped. The latest version is 0.3.9.
slow? --- gocr website --- February 13, 2004 GOCR 0.39 has been released. (try barcode!) December 22, 2003 GOCR 0.38 has been released. Please test it! October 14, 2003 GOCR compiled for Zeta by Bernd T. Korz (btk@yellowtab.com). Do you remember on BeOS? August 11, 2002 GOCR 0.37 has been released. Please test it! February 03, 2002 GOCR 0.3.5 has been released. Get it now! DOS/WIN-binary GOCR.EXE 0.3.5p1 available
On Wed, 25 Feb 2004 05:08:54 +0100
Örn Hansen
slow?
Yes, slow- I keep track of a lot of projects and this is by far the slowest. As you can see from the website: February 13, 2004 GOCR 0.39 has been released. (try barcode!) ... February 03, 2002 GOCR 0.3.5 has been released. Get it now! DOS/WIN-binary GOCR.EXE 0.3.5p1 available a year later and we are still in the 0.3.x release cycle. Coupled with the fact that the project started in 1998, what would you call it? Speedy development? When the project first started I was really exited, but I have since moved on to ocrad. Charles -- "We all know Linux is great...it does infinite loops in 5 seconds." (Linus Torvalds about the superiority of Linux on the Amterdam Linux Symposium)
onsdag 25 februari 2004 05:49 skrev Charles Philip Chan:
a year later and we are still in the 0.3.x release cycle. Coupled with the fact that the project started in 1998, what would you call it? Speedy development? When the project first started I was really exited, but I have since moved on to ocrad.
Ah, that tends to happen often with open source projects. They start up, but something blocks process ... sometimes its malicious intent of competitions, othertimes it's the poor belly that demands attention. life.
Charles
On Wed, 25 Feb 2004 08:10:21 +0100
Örn Hansen
Ah, that tends to happen often with open source projects. They start up, but something blocks process ... sometimes its malicious intent of competitions, othertimes it's the poor belly that demands attention.
I understand. The project seems to be a 2 man show- if only they had opened up development, we would have seen 1.0 a long time ago. Charles -- "It's God. No, not Richard Stallman, or Linus Torvalds, but God." (By Matt Welsh)
onsdag 25 februari 2004 09:03 skrev Charles Philip Chan:
I understand. The project seems to be a 2 man show- if only they had opened up development, we would have seen 1.0 a long time ago.
It appears to be a sourceforge project ... unfortunately, the open source environment isn't as glamorous as it appears to be. Getting people to help along, isn't that easy. I got two projects there, one was blocked by linuxthreats so the two couldn't coexist in the same program :-) the other, well it's not easy to get others to come along. Most people are looking for projects that'll pay, and if the project doesn't have an income source of some kind (direct or indirect) ... it's doomed, except as a hobby for one or two enthusiasts. It may be called Open whatever, but in the end its still about money.
Charles
onsdag 25 februari 2004 01:29 skrev Carlos E. R.:
(no longer in development), you can try using ocrad instead.
Ah, didn't knew about it... it is not included in suse 8.2 - do you know which is its home page? I'll try to compile it.
No longer in development ... not in suse 8.2 ? What are you guys talking about? --- From gocr sourceforge site --- February 13, 2004 GOCR 0.39 has been released. (try barcode!) --- And I certainly do have gocr installed on my SuSE 9.0 box, as well as ocrad and both are supplied by the SuSE dvd.
participants (7)
-
Carlos E. R.
-
Charles Philip Chan
-
John Pettigrew
-
Martin Mielke
-
Silviu Marin-Caea
-
steve-ss
-
Örn Hansen