[opensuse] Re: multi-page continuous scanner anyone?
On Wed, Jun 17, 2009 at 6:51 PM, James Hatridge<James.Hatridge@gmx.de> wrote:
HI Boris, On Thursday 18 June 2009 00:02:51 Boris Epstein wrote:
Hi all,
I am wondering if anybody can recommend a duplex-capable multi-sheet (automatic) scanner to be used under OpenSuSE Linux. By multi-page I mean something with a feeder that can scan in a whole stack of paper.
Thanks.
Boris.
Just last week a weekly news email I get talked about getting a new office scanner. Below is what they wrote. I looked it up on the 'net and it seems to work with Linux. Check it out,
Hope this helps,
JIM
################## A PRODUCT RECOMMENDATION. I needed a new scanner. I asked a few people, including a guy I know who owns a business employing 75 people but has no filing cabinets, what is a good scanner to computerize receipts and such? The response was unanimous: the Fujitsu ScanSnap. It's amazing: a contract and a check? Scans the differing page sizes without a hitch, scans fronts and backs at the same time, discards any blank pages, creates a PDF file, and then does an OCR (optical character recognition) pass on the file so that you can search within it. And it comes with all the software you need, too. The thing is so amazing that if you put a page in upside down, it will usually detect that and flip it around for you! And it detects the rare occasions when it misfeeds (e.g., pulls more than one page through at a time.) A 5-page scan, front AND back, only takes about 15 seconds (plus OCR time; that process runs in the background). I'm completely blown away by it, and my bookkeeper just loves it, too, since I now know where everything is, and can e-mail stuff to her easily. It's a tad pricey ($470), but cheaper at Amazon ($420 as of this writing). This thing gets my highest recommendation. http://ThisIsTrue.com/d-scansnap
Ugh: I just clicked the link to check it before sending this out, and Amazon has raised their price to $465 since Monday's Premium edition! Sheesh. Check Newegg.com, then: it's $409 there right now, though that's plus shipping (free on Amazon). Still, that's the best price I can find right now.
################# -- Jim Hatridge Linux User #88484 Ebay ID: WartHogBulletin
Thanks Jim! Looks like it only goes up to 600x600 dpi optical, though. Boris. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
recognition) pass on the file so that you can search within it. And it comes with all the software you need, too. The thing is so amazing that if you put a page in upside down, it will usually detect that and flip
I see the phrase "all the software you need" and red flags go up. How much of all this is done my 'the device' and how much is done by software on a [Windows?] host? Does this device work with LINUX/openSUSE? I'm curious because I've looked at several such things and they are always tethered to a Windows 200x server, usually with M$-SQL as well. So the cost of the-device is almost irrelevant.
Looks like it only goes up to 600x600 dpi optical, though.
For document archive 600x600 is overkill. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Looks like it only goes up to 600x600 dpi optical, though.
For document archive 600x600 is overkill.
Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2009-06-18 at 16:36 -0400, Greg Freemyer wrote:
Looks like it only goes up to 600x600 dpi optical, though.
For document archive 600x600 is overkill.
Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs.
If I were scanning my magazine collections, with photos, I would use 600dpi minimum, so that I could print a page later as good as the original. Which makes me wonder if it could be possible to scan a page with different resolutions for text and images, automatically. Maybe in the future. Or at least store it differently. Perhaps DjVu... but the available open tools for creating djvu files are far from optimal. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAko6ttMACgkQtTMYHG2NR9WVjACfcMmXEdPRZ//VAajBk+2u7I3X pSAAoIwo72ZyDTtLDnFadul1UCOCsuFs =2LjR -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thursday June 18 2009, Carlos E. R. wrote:
On Thursday, 2009-06-18 at 16:36 -0400, Greg Freemyer wrote:
Looks like it only goes up to 600x600 dpi optical, though.
For document archive 600x600 is overkill.
Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs.
If I were scanning my magazine collections, with photos, I would use 600dpi minimum, so that I could print a page later as good as the original.
I agree, and 600 dpi won't get you a particularly faithful reproduction. Phototypsetting equipment realizes 2400 DPI, typically.
Which makes me wonder if it could be possible to scan a page with different resolutions for text and images, automatically.
Maybe in the future.
Or at least store it differently. Perhaps DjVu... but the available open tools for creating djvu files are far from optimal.
I'm a little curious what Google and ACM (to name only two) use to digitize print collections. The results render well and, what's much more impressive are OCR-ed quite well, too. ACM's entire digital library (most of which predates digital originals) is searchable even when the original had to be scanned and OCR-ed.
-- Cheers, Carlos E. R.
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2009-06-18 at 15:51 -0700, Randall R Schulz wrote:
On Thursday June 18 2009, Carlos E. R. wrote:
On Thursday, 2009-06-18 at 16:36 -0400, Greg Freemyer wrote:
Looks like it only goes up to 600x600 dpi optical, though.
For document archive 600x600 is overkill.
Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs.
If I were scanning my magazine collections, with photos, I would use 600dpi minimum, so that I could print a page later as good as the original.
I agree, and 600 dpi won't get you a particularly faithful reproduction. Phototypsetting equipment realizes 2400 DPI, typically.
600 dpi happens to be my printer resolution, so going further would be pointless ;-)
Which makes me wonder if it could be possible to scan a page with different resolutions for text and images, automatically.
Maybe in the future.
Or at least store it differently. Perhaps DjVu... but the available open tools for creating djvu files are far from optimal.
I'm a little curious what Google and ACM (to name only two) use to digitize print collections. The results render well and, what's much more impressive are OCR-ed quite well, too. ACM's entire digital library (most of which predates digital originals) is searchable even when the original had to be scanned and OCR-ed.
Yep. Good OCR for me is almost impossible to achieve, but these big chaps seems to have it solved. Djvu format, by the way, can store B/W for text, color for photos, and text for the OCR, all in the same file and for each page. In theory, at least: with the open tools we have that's almost impossible to get. The better tools are not open. It is a very good format for scanned material, but it doesn't seem to catch :-? - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAko8naAACgkQtTMYHG2NR9XZjACeP8AKmEtwJDlMP1rsAtitF6aM sW0AoI7QZhla26P/CbR86Tr5SHVgMTjR =GHRx -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Saturday 20 June 2009 03:28:07 am Carlos E. R. wrote:
On Thursday, 2009-06-18 at 15:51 -0700, Randall R Schulz wrote:
On Thursday June 18 2009, Carlos E. R. wrote:
On Thursday, 2009-06-18 at 16:36 -0400, Greg Freemyer wrote:
Looks like it only goes up to 600x600 dpi optical, though.
For document archive 600x600 is overkill.
Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs.
If I were scanning my magazine collections, with photos, I would use 600dpi minimum, so that I could print a page later as good as the original.
I agree, and 600 dpi won't get you a particularly faithful reproduction. Phototypsetting equipment realizes 2400 DPI, typically.
600 dpi happens to be my printer resolution, so going further would be pointless ;-)
Which makes me wonder if it could be possible to scan a page with different resolutions for text and images, automatically.
Maybe in the future.
Or at least store it differently. Perhaps DjVu... but the available open tools for creating djvu files are far from optimal.
I'm a little curious what Google and ACM (to name only two) use to digitize print collections. The results render well and, what's much more impressive are OCR-ed quite well, too. ACM's entire digital library (most of which predates digital originals) is searchable even when the original had to be scanned and OCR-ed.
Yep. Good OCR for me is almost impossible to achieve, but these big chaps seems to have it solved.
Djvu format, by the way, can store B/W for text, color for photos, and text for the OCR, all in the same file and for each page. In theory, at least: with the open tools we have that's almost impossible to get. The better tools are not open.
It is a very good format for scanned material, but it doesn't seem to catch :-?
Just to add to the OCR discussion, I have had good luck with tesseract. I use it as part of our hylafax/avantfax fax server that automatically does OCR on incoming faxes at our office.... -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Saturday, 2009-06-27 at 22:32 -0500, David C. Rankin wrote: ...
Just to add to the OCR discussion, I have had good luck with tesseract. I use it as part of our hylafax/avantfax fax server that automatically does OCR on incoming faxes at our office....
Interesting. From Wikipedia, the free encyclopedia In computer software, Tesseract is a free optical character recognition engine. It was originally developed as proprietary software at Hewlett-Packard between 1985 until 1995. After ten years without any development taking place, Hewlett Packard and UNLV released it as open source in 2005. Tesseract is currently developed by Google and released under the Apache License, Version 2.0.[2][3][1] Tesseract is considered one of the most accurate free software OCR engines currently available.[3][4] The current version of Tesseract is 2.03, released April 22, 2008.[5] ... Tesseract is an OCR engine, and it does not have a graphical user interface. It runs from the command line, and may be called with the command:[7] tesseract image.tif output [options] Tesseract handles image files in TIFF format (with filename extension .tif);[7] other file formats need to be converted to TIFF before being submitted to Tesseract. Tesseract does not support layout analysis, which means that it cannot interpret multi-column text, images, or equations, and in these cases will produce a garbled text output.[3] http://en.wikipedia.org/wiki/Tesseract_(software) You could add how do you installed it, in suse. Looking on webpin, I just see unofficial packages: cer@nimrodel:~> webpin tesseract 2 results (2 packages) found for "tesseract" in openSUSE_110 * tesseract-ocr: An OCR engine - 20080718svn178 [BS::home:/jnweiger] * tesseract-ocr-devel: Libraries and Header Files to Develop with Tesseract - 20080718svn178 [BS::home:/jnweiger] cer@nimrodel:~> The wikipedia mentions also OCRopus, used by Google Book Search, using Tesseract as a plugin: OCRopus is a free document analysis and OCR system released under the Apache License, Version 2.0 with a very modular design through the use of plugins. These plugins allow OCRopus to swap out components easily. OCRopus is currently developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and is sponsored by Google. OCRopus is developed for Linux; however, users have reported success with OCRopus on Mac OS X and an application called TakOCR[1] has been developed that installs OCRopus on Mac OS X and provides a simple droplet interface. It is also CLI only. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkpHNakACgkQtTMYHG2NR9XtuwCeMEvFr0hfvWdoRsmpsLrfFV/0 L8QAoIDUAZz/j/KPQvPhLuLsnjuXDbDJ =2Eac -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sat, 2009-06-27 at 22:32 -0500, David C. Rankin wrote:
On Saturday 20 June 2009 03:28:07 am Carlos E. R. wrote:
On Thursday, 2009-06-18 at 15:51 -0700, Randall R Schulz wrote:
On Thursday June 18 2009, Carlos E. R. wrote:
On Thursday, 2009-06-18 at 16:36 -0400, Greg Freemyer wrote:
> Looks like it only goes up to 600x600 dpi optical, though. For document archive 600x600 is overkill. Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs. If I were scanning my magazine collections, with photos, I would use 600dpi minimum, so that I could print a page later as good as the original. I agree, and 600 dpi won't get you a particularly faithful reproduction. Phototypsetting equipment realizes 2400 DPI, typically. 600 dpi happens to be my printer resolution, so going further would be pointless ;-) Which makes me wonder if it could be possible to scan a page with different resolutions for text and images, automatically. Maybe in the future. Or at least store it differently. Perhaps DjVu... but the available open tools for creating djvu files are far from optimal. I'm a little curious what Google and ACM (to name only two) use to digitize print collections. The results render well and, what's much more impressive are OCR-ed quite well, too. ACM's entire digital library (most of which predates digital originals) is searchable even when the original had to be scanned and OCR-ed. Yep. Good OCR for me is almost impossible to achieve, but these big chaps seems to have it solved. Djvu format, by the way, can store B/W for text, color for photos, and text for the OCR, all in the same file and for each page. In theory, at least: with the open tools we have that's almost impossible to get. The better tools are not open. It is a very good format for scanned material, but it doesn't seem to catch :-? Just to add to the OCR discussion, I have had good luck with tesseract. I use it as part of our hylafax/avantfax fax server that automatically does OCR on incoming faxes at our office....
How about posting your Hylafax faxrcvd script so other can use it as a template? Or a link if you used some site/howto for setting it up. -- OpenGroupware developer: awilliam@whitemice.org <http://whitemiceconsulting.blogspot.com/> OpenGroupare & Cyrus IMAPd documenation @ <http://docs.opengroupware.org/Members/whitemice/wmogag/file_view> -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Sunday 28 June 2009 09:26:16 am Adam Tauno Williams wrote:
On Sat, 2009-06-27 at 22:32 -0500, David C. Rankin wrote:
On Saturday 20 June 2009 03:28:07 am Carlos E. R. wrote:
On Thursday, 2009-06-18 at 15:51 -0700, Randall R Schulz wrote:
On Thursday June 18 2009, Carlos E. R. wrote:
On Thursday, 2009-06-18 at 16:36 -0400, Greg Freemyer wrote:
>> Looks like it only goes up to 600x600 dpi optical, though. > > For document archive 600x600 is overkill.
Typically 200x200 is used and 300x300 is used for high quality. Assuming your coming from normal paper docs.
If I were scanning my magazine collections, with photos, I would use 600dpi minimum, so that I could print a page later as good as the original.
I agree, and 600 dpi won't get you a particularly faithful reproduction. Phototypsetting equipment realizes 2400 DPI, typically.
600 dpi happens to be my printer resolution, so going further would be pointless ;-)
Which makes me wonder if it could be possible to scan a page with different resolutions for text and images, automatically. Maybe in the future. Or at least store it differently. Perhaps DjVu... but the available open tools for creating djvu files are far from optimal.
I'm a little curious what Google and ACM (to name only two) use to digitize print collections. The results render well and, what's much more impressive are OCR-ed quite well, too. ACM's entire digital library (most of which predates digital originals) is searchable even when the original had to be scanned and OCR-ed.
Yep. Good OCR for me is almost impossible to achieve, but these big chaps seems to have it solved. Djvu format, by the way, can store B/W for text, color for photos, and text for the OCR, all in the same file and for each page. In theory, at least: with the open tools we have that's almost impossible to get. The better tools are not open. It is a very good format for scanned material, but it doesn't seem to catch :-?
Just to add to the OCR discussion, I have had good luck with tesseract. I use it as part of our hylafax/avantfax fax server that automatically does OCR on incoming faxes at our office....
How about posting your Hylafax faxrcvd script so other can use it as a template? Or a link if you used some site/howto for setting it up.
Sure, The Package I uses was Avantfax. I set up a page that is a short howto: http://www.3111skyline.com/linux/avantfax.php -- David C. Rankin, J.D.,P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (6)
-
Adam Tauno Williams
-
Boris Epstein
-
Carlos E. R.
-
David C. Rankin
-
Greg Freemyer
-
Randall R Schulz