[opensuse] graphical ocr for linux/opensuse - a report
![](https://seccdn.libravatar.org/avatar/d0edefa23f9401a724b4d56ec040432f.jpg?s=120&d=mm&r=g)
Hello: This is only a report about an ocr (optical character recognition) program for linux / opensuse. It has been a problem for me for a long time to find a reliable, good working ocr program for linux that can recognize Hungarian accented characters. Recently I found 'cuneiform' ocr program, and a graphical frontend for it, called 'yagf'. These two together work very well, and the usage is straightforward. Cuneiform has several language modules, and reliably can recognize "normal" and accented characters. It can use several image types inlcuding jpg, tiff, png and bmp images. The recognition options are easliy configurable in yagf. If html output is chosen it even can make difference between smaller and larger fonts and can identify section titles and bold face fonts. Of course it does not do it without some errors, but it is acceptable. yagf can invoke xsane directly and use the scanned image from it. Both cuneiform and yagf are available in opensuse build service (obs) repositories (cuneiform 0.9.0 and yagf 0.8.1). Cheers, Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/2900b09e064ae279b7ea5af309c31a90.jpg?s=120&d=mm&r=g)
Istvan Gabor wrote:
Hello:
This is only a report about an ocr (optical character recognition) program for linux / opensuse.
It has been a problem for me for a long time to find a reliable, good working ocr program for linux that can recognize Hungarian accented characters.
Hmm, did you check tesseract http://code.google.com/p/tesseract-ocr/ I had a look at it some time ago and was impressed by speed and quality. It doesn't support Hungarian, but is said to be trainable... Pit -- Dr. Peter "Pit" Suetterlin http://www.astro.su.se/~pit Institute for Solar Physics Tel.: +34 922 405 590 (Spain) P.Suetterlin@royac.iac.es +46 8 5537 8507 (Sweden) Peter.Suetterlin@astro.su.se -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/d0edefa23f9401a724b4d56ec040432f.jpg?s=120&d=mm&r=g)
2010. március 11. 16:27 napon Peter Suetterlin <P.Suetterlin@royac.iac.es> írta:
Istvan Gabor wrote:
Hello:
This is only a report about an ocr (optical character recognition) program for linux / opensuse.
It has been a problem for me for a long time to find a reliable, good working ocr program for linux that can recognize Hungarian accented characters.
Hmm, did you check tesseract http://code.google.com/p/tesseract-ocr/ I had a look at it some time ago and was impressed by speed and quality. It doesn't support Hungarian, but is said to be trainable...
Yes. it was the first one I checked. It is really accurate but it does not support as many image formats (only tiff I guess), and has no graphical frontend, can not recognize blocks. The training process seemed to me quite difficult, and I gave up with it. Cheers, Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/473f8274f08e49f8d2d7f350b3e00738.jpg?s=120&d=mm&r=g)
Istvan Gabor skrev:
Hello:
This is only a report about an ocr (optical character recognition) program for linux / opensuse. It has been a problem for me for a long time to find a reliable, good working ocr program for linux that can recognize Hungarian accented characters. Recently I found 'cuneiform' ocr program, and a graphical frontend for it, called 'yagf'. These two together work very well, and the usage is straightforward. Cuneiform has several language modules, and reliably can recognize "normal" and accented characters. It can use several image types inlcuding jpg, tiff, png and bmp images. The recognition options are easliy configurable in yagf. If html output is chosen it even can make difference between smaller and larger fonts and can identify section titles and bold face fonts. Of course it does not do it without some errors, but it is acceptable. yagf can invoke xsane directly and use the scanned image from it.
Both cuneiform and yagf are available in opensuse build service (obs) repositories (cuneiform 0.9.0 and yagf 0.8.1).
There is or used to be a GPL-ed conversion app that can turn out .bmp files from .pdf documents. Thought it might be worth mentioning, despite my not having a link handy. BR, Gudmund -- This message and any replies to it is scanned by http://www.fra.se. Please direct any complaints about this to them. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/d0edefa23f9401a724b4d56ec040432f.jpg?s=120&d=mm&r=g)
2010. március 11. 17:14 napon Gudmund Areskoug <gudmundpublic@gmail.com> írta: [snip]
There is or used to be a GPL-ed conversion app that can turn out .bmp files from .pdf documents.
Thought it might be worth mentioning, despite my not having a link handy.
I guess you mean pdfimages program, part of xpdf. man pdfimages: NAME pdfimages - Portable Document Format (PDF) image extractor (version 3.00) DESCRIPTION Pdfimages saves images from a Portable Document Format (PDF) file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Cheers, Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (3)
-
Gudmund Areskoug
-
Istvan Gabor
-
Peter Suetterlin