Mailinglist Archive: opensuse (1473 mails)

< Previous Next >
[opensuse] OCR [Was: Re: multi-page continuous scanner anyone?]
  • From: "Carlos E. R." <robin.listas@xxxxxxxxxxxxxx>
  • Date: Sun, 28 Jun 2009 11:19:30 +0200 (CEST)
  • Message-id: <alpine.LSU.2.00.0906281107580.5499@xxxxxxxxxxxxxxxx>
Hash: SHA1

On Saturday, 2009-06-27 at 22:32 -0500, David C. Rankin wrote:


Just to add to the OCR discussion, I have had good luck with tesseract. I use
it as part of our hylafax/avantfax fax server that automatically does OCR on
incoming faxes at our office....


From Wikipedia, the free encyclopedia

In computer software, Tesseract is a free optical character recognition
engine. It was originally developed as proprietary software at
Hewlett-Packard between 1985 until 1995. After ten years without any
development taking place, Hewlett Packard and UNLV released it as open
source in 2005. Tesseract is currently developed by Google and released
under the Apache License, Version 2.0.[2][3][1]

Tesseract is considered one of the most accurate free software OCR
engines currently available.[3][4]

The current version of Tesseract is 2.03, released April 22, 2008.[5]


Tesseract is an OCR engine, and it does not have a graphical user
interface. It runs from the command line, and may be called with the

tesseract image.tif output [options]

Tesseract handles image files in TIFF format (with filename extension
.tif);[7] other file formats need to be converted to TIFF before being
submitted to Tesseract.

Tesseract does not support layout analysis, which means that it cannot
interpret multi-column text, images, or equations, and in these cases
will produce a garbled text output.[3]

You could add how do you installed it, in suse. Looking on webpin, I just see unofficial packages:

cer@nimrodel:~> webpin tesseract
2 results (2 packages) found for "tesseract" in openSUSE_110
* tesseract-ocr: An OCR engine
- 20080718svn178 [BS::home:/jnweiger]
* tesseract-ocr-devel: Libraries and Header Files to Develop with Tesseract
- 20080718svn178 [BS::home:/jnweiger]

The wikipedia mentions also OCRopus, used by Google Book Search, using Tesseract as a plugin:

OCRopus is a free document analysis and OCR system released under the
Apache License, Version 2.0 with a very modular design through the use of
plugins. These plugins allow OCRopus to swap out components easily.

OCRopus is currently developed under the lead of Thomas Breuel from the
German Research Centre for Artificial Intelligence in Kaiserslautern,
Germany and is sponsored by Google. OCRopus is developed for Linux;
however, users have reported success with OCRopus on Mac OS X and an
application called TakOCR[1] has been developed that installs OCRopus on
Mac OS X and provides a simple droplet interface.

It is also CLI only.

- -- Cheers,
Carlos E. R.
Version: GnuPG v2.0.9 (GNU/Linux)

To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >