[opensuse] Text-to-speech from scanned documents (xsane)
Hi folks, I would like to have tts output from PDF files (created by xsane from paper documents)using okular. I already have the tts output set up on Suse 12.1 using Jovie. I can already selecttext in PDF files - using okular's text selection tool andthen get speech output of the selected text. I can also use the select tool to select part of the image, and okular will still pop up a menu with the option to speak out the selection. I get the PDF files from medical journals and other publications. However, when I create my own PDF files using an HP Scanjet 3110 (via xsane), I can't select the text. I can select parts of the image, but I don't get speech output. Is there any way tocreate PDF files that can be read out by okular? Thanks, Gustav -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hi, Am 06.02.2013 06:02, schrieb Gustav Degreef:
Hi folks,
I would like to have tts output from PDF files (created by xsane from paper documents)using okular. I already have the tts output set up on Suse 12.1 using Jovie.
I can already selecttext in PDF files - using okular's text selection tool andthen get speech output of the selected text. I can also use the select tool to select part of the image, and okular will still pop up a menu with the option to speak out the selection.
I get the PDF files from medical journals and other publications.
However, when I create my own PDF files using an HP Scanjet 3110 (via xsane), I can't select the text. I can select parts of the image, but I don't get speech output. Is there any way tocreate PDF files that can be read out by okular? Thanks, Gustav
If I understand you right you want speech output from scanned documents. That's basically not possible because the text is not "text" but "image" and the text to speech tools can't "read" the text from an image. The only way to make this happen would be to run a text recognition software to change the image into plain text and then use the text to speech tools. Karl -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/06/2013 09:21 PM, Karl Sinn wrote:
Hi,
Am 06.02.2013 06:02, schrieb Gustav Degreef:
Hi folks,
I would like to have tts output from PDF files (created by xsane from paper documents)using okular. I already have the tts output set up on Suse 12.1 using Jovie.
I can already selecttext in PDF files - using okular's text selection tool andthen get speech output of the selected text. I can also use the select tool to select part of the image, and okular will still pop up a menu with the option to speak out the selection.
I get the PDF files from medical journals and other publications.
However, when I create my own PDF files using an HP Scanjet 3110 (via xsane), I can't select the text. I can select parts of the image, but I don't get speech output. Is there any way tocreate PDF files that can be read out by okular? Thanks, Gustav
If I understand you right you want speech output from scanned documents. That's basically not possible because the text is not "text" but "image" and the text to speech tools can't "read" the text from an image. The only way to make this happen would be to run a text recognition software to change the image into plain text and then use the text to speech tools.
Karl
Yes, you understood correctly. But why can okular read out other PDF documents? Are they not images? Excuse my ignorance. Thanks, Gustav. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hi,
Yes, you understood correctly. But why can okular read out other PDF documents? Are they not images? Excuse my ignorance. Thanks, Gustav.
Yes, they are not images, they are text. It's like in a openoffice-document. You can write text "as text" or you can include text as a scanned image. If you include text as a scanned image you'll not be able to alter the text because it's not "text". Karl -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/06/2013 09:46 PM, Karl Sinn wrote:
Hi,
Yes, you understood correctly. But why can okular read out other PDF documents? Are they not images? Excuse my ignorance. Thanks, Gustav.
Yes, they are not images, they are text.
It's like in a openoffice-document. You can write text "as text" or you can include text as a scanned image. If you include text as a scanned image you'll not be able to alter the text because it's not "text".
Karl
Thanks very much, I understand. Gustav. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/06/2013 01:08 AM, Gustav Degreef pecked at the keyboard and wrote:
On 02/06/2013 09:46 PM, Karl Sinn wrote:
Hi,
Yes, you understood correctly. But why can okular read out other PDF documents? Are they not images? Excuse my ignorance. Thanks, Gustav.
Yes, they are not images, they are text.
It's like in a openoffice-document. You can write text "as text" or you can include text as a scanned image. If you include text as a scanned image you'll not be able to alter the text because it's not "text".
Karl
Thanks very much, I understand. Gustav.
You can try to run an OCR programme on the image to extract text. -- Ken Schneider SuSe since Version 5.2, June 1998 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 02/06/2013 10:12 PM, Ken Schneider - openSUSE wrote:
On 02/06/2013 01:08 AM, Gustav Degreef pecked at the keyboard and wrote:
On 02/06/2013 09:46 PM, Karl Sinn wrote:
Hi,
Yes, you understood correctly. But why can okular read out other PDF documents? Are they not images? Excuse my ignorance. Thanks, Gustav.
Yes, they are not images, they are text.
It's like in a openoffice-document. You can write text "as text" or you can include text as a scanned image. If you include text as a scanned image you'll not be able to alter the text because it's not "text".
Karl
Thanks very much, I understand. Gustav.
You can try to run an OCR programme on the image to extract text.
Yes, I've used tesseract about three years ago. It was a command line program but produced very good results. However, I want to scan several books (200-400 pages each), and have them read out to me. For smaller jobs OCR is great, for these long documents having to OCR and verify the text is prohibitive in terms of time. I have a proprietary program (and hardware) that can scan books and produce a document which is eventually converted to speech. I was trying to get around using it because it stores in a proprietary format, requires proprietary hardware and also requires a win OS. I am now using FoxVox (tts plugin, Firefox), okular for PDF's (I was previously had to use acrobat reader) and jovie (with kmouth) to read out text docs. I'm trying to move even more towards open source tts tools, thanks for the comments, Gustav. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (3)
-
Gustav Degreef
-
Karl Sinn
-
Ken Schneider - openSUSE