Does anyone know of an open source application that will convert PDF into something more useful. Any leads appreciated. -- Don Henson
On Wed, Aug 27, 2008 at 6:20 PM, Donald D Henson <wepin-list@wepin.com> wrote:
Does anyone know of an open source application that will convert PDF into something more useful. Any leads appreciated.
-- Don Henson
pdftops pdftotext There's probably other utilities too. Boris. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful. Any leads appreciated.
Hi Donald, But that's exactly why PDF is useful! It's difficult to tamper with. That being said, I've used a utility called "pdftohtml" to publish pdfs on web sites. It does a fair job and allows the search engines to index the content. There are some other pdfto*** programs out there, google is your friend. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful.
Can you characterize "more useful?" PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language. Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion. Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful.
Can you characterize "more useful?"
PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language.
Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion.
Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz
I have a client who publishes a weekly newspaper. He wants to put all but the current edition online. The software he uses to publish the paper has only pdf output. He's been posting the pdf files directly but the paper is growing and the delay between when a user clicks the button and something shows up on the screen is becoming bothersome to his readers. I figure that if we can convert the pdf to html, we can play all sorts of games to make the display show up faster. -- Don Henson
On Wednesday 27 August 2008 16:59, Donald D Henson wrote:
...
I have a client who publishes a weekly newspaper. He wants to put all but the current edition online. The software he uses to publish the paper has only pdf output. He's been posting the pdf files directly but the paper is growing and the delay between when a user clicks the button and something shows up on the screen is becoming bothersome to his readers. I figure that if we can convert the pdf to html, we can play all sorts of games to make the display show up faster.
I don't understand. Are you saying that each edition includes all the previous ones? Why would he do that? If the problem is just the size of each individual edition, which is growing over time, then you should be producing Web-optimized versions of the PDF. They will display the first page as soon as possible, not requiring the entire document to be retrieved before anything is displayed. Now, if there is no open-source software that can produce this format PDF file, then I suggest you simpley accept that fact and buy a copy Acrobat so you can produce properly optimized PDF files. Converting PDF to HTML is guaranteed to produce inferior results. I don't recommend it. Playing the kind of games you hint at is unlikely to benefit your end users. Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
Converting PDF to HTML is guaranteed to produce inferior results. I don't recommend it. Playing the kind of games you hint at is unlikely to benefit your end users.
Hi Randall, I agree about the results, but there are times when it's useful. I'm with a non-profit that sends out a monthly newsletter to about 700 members. This is printed in b/w and sent via snail mail, its usually about 5 double-sided pages. But I also place the source pdfs on the org's web site and keep the older versions there for archival purposes. This works well since the source is in color and there are frequently color photos that we can't afford to distribute in paper form. But it's nice to be able to index the archived newsletters for historical reference (it's a museum), but pdf doesn't work well for indexing text. So I use pdftohtml and offer it right next to the link pointing at each pdf edition. The text is indexable and the high quality pdf is right there with it. pdftohtml does a fairly good job, about the only time I've seen it mess up is when the source pdf has photo credits written at a 90-degree angle along the vertical sides of photos. They come out as horizontal lines. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 27 August 2008 17:55, Lew Wolfgang wrote:
Randall R Schulz wrote:
Converting PDF to HTML is guaranteed to produce inferior results. I don't recommend it. Playing the kind of games you hint at is unlikely to benefit your end users.
Hi Randall,
I agree about the results, but there are times when it's useful.
I'm with a non-profit that sends out a monthly newsletter to about 700 members. This is printed in b/w and sent via snail mail, its usually about 5 double-sided pages.
If you can afford direct mail, you can afford Acrobat. Do what I did and haunt a local used bookstore that also resells software until a copy of Acrobat turns up.
...
Regards, Lew
Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Another factor no one has touched on yet - is the density / size of graphics in the PDF. In an html page you can dynamically resize a picture: meaning, for example, the picture source is 8 x 10 and it can be displayed as a thumbnail. Which means the rendering engine must resize it on the fly. So, in this case, the optimal would be to resize the picture to thumbnail size first. I would assume this would apply to PDF rendering as well. -- Duaine Hechler Piano, Player Piano, Pump Organ Tuning, Servicing & Rebuilding Associate Member of the Piano Technicians Guild Reed Organ Society Member St. Louis, MO 63034 (314) 838-5587 dahechler@charter.net www.hechlerpianoandorgan.com -- Home & Business user of Linux - 9+ years -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 27 Aug 2008 20:32:47 -0700, Randall R Schulz wrote:
I'm with a non-profit that sends out a monthly newsletter to about 700 members. This is printed in b/w and sent via snail mail, its usually about 5 double-sided pages.
If you can afford direct mail, you can afford Acrobat. Do what I did and haunt a local used bookstore that also resells software until a copy of Acrobat turns up.
That's not necessarily the case, depending on the non-profit organization. I do some work for my local community organization, and the city covers sending a quarterly mailing to residents in the community. The organization doesn't have to pay for that mailing. But the city won't subsidize the purchase of the software to create the mailing. Jim -- Jim Henderson Please keep on-topic replies on the list so everyone benefits -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Hi On 8/28/08, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Randall R Schulz wrote:
Converting PDF to HTML is guaranteed to produce inferior results. I don't recommend it. Playing the kind of games you hint at is unlikely to benefit your end users.
Hi Randall,
I agree about the results, but there are times when it's useful.
I'm with a non-profit that sends out a monthly newsletter to about 700 members. This is printed in b/w and sent via snail mail, its usually about 5 double-sided pages.
Unless you do some very strange things 5 pages isn't going to load unacceptably slow in PDF.
But I also place the source pdfs on the org's web site and keep the older versions there for archival purposes. This works well since the source is in color and there are frequently color photos that we can't afford to distribute in paper form. But it's nice to be able to index the archived newsletters for historical reference (it's a museum), but pdf doesn't work well for indexing text.
What do you mean exactly? If you'd like to search te PDF (or a set of PDF's) for the occurence of words (like Google does) Beagle might be able to help. I cannot do more there than pointing. If you mean the reader would be able to search within the file and jump to the correct section with a usefull index, then that is already available in PDF. For example, if you use the header types correctly in Oo then the generated PDF has an index by default. It uses the headers to determine what are chapters and paragraphs and inserts those into the PDF. Neil -- There are two kinds of people: 1. People who start their arrays with 1. 1. People who start their arrays with 0. ----------------------------------------------------------------------- ** Hi! I'm a signature virus! Copy me into your signature, please! ** ----------------------------------------------------------------------- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Neil wrote:
Unless you do some very strange things 5 pages isn't going to load unacceptably slow in PDF.
Agree, I'm not concerned with load speed here. The author does like to use hi-res graphics, but that's not of issue.
But I also place the source pdfs on the org's web site and keep the older versions there for archival purposes. This works well since the source is in color and there are frequently color photos that we can't afford to distribute in paper form. But it's nice to be able to index the archived newsletters for historical reference (it's a museum), but pdf doesn't work well for indexing text.
What do you mean exactly?
If you'd like to search te PDF (or a set of PDF's) for the occurence of words (like Google does) Beagle might be able to help. I cannot do more there than pointing.
Yes, like google does. I want the pdf's available so that search engine spiders will index the text content.
If you mean the reader would be able to search within the file and jump to the correct section with a usefull index, then that is already available in PDF. For example, if you use the header types correctly in Oo then the generated PDF has an index by default. It uses the headers to determine what are chapters and paragraphs and inserts those into the PDF.
That's useful to know, but it's not what I'm talking about. I also don't have the source documents for these pdfs. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote: <snip>
Converting PDF to HTML is guaranteed to produce inferior results. I don't recommend it. Playing the kind of games you hint at is unlikely to benefit your end users.
Randall Schulz
Especially when one considers that the layout of the newspaper invariably includes photographs, advertisements, text in various fonts that may not display well in a browser that is missing that font, graphics from other sources. Also consider that a complicated page of html may change depending on the browser. To get an idea of what a mess this could be, simply pull up a page of any good-sized newspaper (e.g. the New York Times) and View Page Source and see if you're motivated to reproduce this for your paper. We had a school newspaper that we posted on-line in .pdf and they had hundreds of old editions, each of which appeared instantaneously. One person had the task of writing copy in a word processor of an evil software company, another person took that copy and "composed" pages and created .pdf files and passed them off to the webmaster who posted them to the server and edited the links on the school webpage. No problemo. -- Tony Alfrey tonyalfrey@earthlink.net "I'd Rather Be Sailing" -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Wednesday 27 August 2008 16:59, Donald D Henson wrote:
...
I have a client who publishes a weekly newspaper. He wants to put all but the current edition online. The software he uses to publish the paper has only pdf output. He's been posting the pdf files directly but the paper is growing and the delay between when a user clicks the button and something shows up on the screen is becoming bothersome to his readers. I figure that if we can convert the pdf to html, we can play all sorts of games to make the display show up faster.
I don't understand. Are you saying that each edition includes all the previous ones? Why would he do that?
If the problem is just the size of each individual edition, which is growing over time, then you should be producing Web-optimized versions of the PDF. They will display the first page as soon as possible, not requiring the entire document to be retrieved before anything is displayed.
Now, if there is no open-source software that can produce this format PDF file, then I suggest you simpley accept that fact and buy a copy Acrobat so you can produce properly optimized PDF files.
Converting PDF to HTML is guaranteed to produce inferior results. I don't recommend it. Playing the kind of games you hint at is unlikely to benefit your end users.
Randall Schulz
Interesting. I discovered recently, within the past couple of days, that InDesign CS3 is an Adobe product that can export XHTML and my client already owns it. If we decide to go that route, my client's problem is resolved. -- Don Henson
On Friday 29 August 2008 15:46, Donald D Henson wrote:
Randall R Schulz wrote:
...
Randall Schulz
Interesting. I discovered recently, within the past couple of days, that InDesign CS3 is an Adobe product that can export XHTML and my client already owns it. If we decide to go that route, my client's problem is resolved.
Despite the fact that I once worked for Adobe and hated it, I'd expect their software to do a better job of (X)HTML export than most others. But none of their design software comes cheap, so: 1) I hope it does what you need; 1a) You should confirm that it does; 2) I hope you can afford it. Randall Schulz -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Friday 29 August 2008 15:46, Donald D Henson wrote:
...
Randall Schulz Interesting. I discovered recently, within the past couple of days,
Randall R Schulz wrote: that InDesign CS3 is an Adobe product that can export XHTML and my client already owns it. If we decide to go that route, my client's problem is resolved.
Despite the fact that I once worked for Adobe and hated it, I'd expect their software to do a better job of (X)HTML export than most others. But none of their design software comes cheap, so: 1) I hope it does what you need; 1a) You should confirm that it does; 2) I hope you can afford it.
Randall Schulz
I hear that. -- Don Henson
Donald D Henson pecked at the keyboard and wrote:
Randall R Schulz wrote:
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful. Can you characterize "more useful?"
PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language.
Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion.
Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz
I have a client who publishes a weekly newspaper. He wants to put all but the current edition online. The software he uses to publish the paper has only pdf output. He's been posting the pdf files directly but the paper is growing and the delay between when a user clicks the button and something shows up on the screen is becoming bothersome to his readers. I figure that if we can convert the pdf to html, we can play all sorts of games to make the display show up faster.
I use pdftohtml to convert a pdf file into html that appears to load faster to the readers. It will be faster as long as no graphics are involved. Also they cannot resize the text either. pdftohtml -c input_pdf It's not the greatest but it works. And if he knows html he can clean up the output to his liking. -- Ken Schneider SuSe since Version 5.2, June 1998 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 27 Aug 2008 17:59:02 -0600, Donald D Henson wrote:
Randall R Schulz wrote:
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful.
Can you characterize "more useful?"
PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language.
Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion.
Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz
I have a client who publishes a weekly newspaper. He wants to put all but the current edition online. The software he uses to publish the paper has only pdf output. He's been posting the pdf files directly but the paper is growing and the delay between when a user clicks the button and something shows up on the screen is becoming bothersome to his readers. I figure that if we can convert the pdf to html, we can play all sorts of games to make the display show up faster.
I seem to recall there was an experimental plugin for OpenOffice that would import PDFs so they could be edited. Last I heard, I don't know it was complete, but might be worth a look since OO can produce HTML output. Jim -- Jim Henderson Please keep on-topic replies on the list so everyone benefits -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Randall R Schulz wrote:
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful.
Can you characterize "more useful?"
PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language.
Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion.
Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz
A friend of mine publishes a newsletter using a Mac, and he would very much like an open source pdfto... so he can edit the text. I know this is OT, but there are experts out there! --doug -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
D. McGarrett wrote:
Randall R Schulz wrote:
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful.
Can you characterize "more useful?"
PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language.
Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion.
Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz
A friend of mine publishes a newsletter using a Mac, and he would very much like an open source pdfto... so he can edit the text. I know this is OT, but there are experts out there!
I don't understand why your friend would not compose/edit such a newsletter using a conventional word processor or even Macintosh Page and only at the end, when posting to a server (where a link is called from an html page or whatever), would one convert to .pdf? In other words, edit in the word processor or composer. Think of .pdf as a printer: print to paper or print to .pdf file.
--doug
-- Tony Alfrey tonyalfrey@earthlink.net "I'd Rather Be Sailing" -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Tony Alfrey wrote:
D. McGarrett wrote:
Randall R Schulz wrote:
On Wednesday 27 August 2008 15:20, Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful.
Can you characterize "more useful?"
PDF can be converted to PostScript, obviously, which could be considered to have more utility. And it can clearly be converted to, say, HP PCL, since that happens whenever printing to a printer that uses that language.
Anyway, I don't think there are many other formats that really do what PDF does, so I can't think of a useful conversion.
Why do you want to convert from PDF to something else (something non-specific)?
Any leads appreciated.
Randall Schulz
A friend of mine publishes a newsletter using a Mac, and he would very much like an open source pdfto... so he can edit the text. I know this is OT, but there are experts out there!
I don't understand why your friend would not compose/edit such a newsletter using a conventional word processor or even Macintosh Page and only at the end, when posting to a server (where a link is called from an html page or whatever), would one convert to .pdf? In other words, edit in the word processor or composer. Think of .pdf as a printer: print to paper or print to .pdf file.
--doug
He's been doing it his way for years and is afraid of change. -- Don Henson
Donald D Henson wrote:
Tony Alfrey wrote:
D. McGarrett wrote:
A friend of mine publishes a newsletter using a Mac, and he would very much like an open source pdfto... so he can edit the text. I know this is OT, but there are experts out there! I don't understand why your friend would not compose/edit such a newsletter using a conventional word processor or even Macintosh Page and only at the end, when posting to a server (where a link is called from an html page or whatever), would one convert to .pdf? In other words, edit in the word processor or composer. Think of .pdf as a printer: print to paper or print to .pdf file.
--doug
He's been doing it his way for years and is afraid of change.
I understand. -- Tony Alfrey tonyalfrey@earthlink.net "I'd Rather Be Sailing" -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
A friend of mine publishes a newsletter using a Mac, and he would very much like an open source pdfto... so he can edit the text. I know this is OT, but there are experts out there!
OpenOffice.org 3.0..... it can open PDFs in Draw. It's not perfect though... some minor issues (or major depending on how you look at it). It does allow you to open and edit a PDF though. Future versions should be able to open PDFs into Writer. Also... OOo will be supporting a new hybrid PDF format that can have an embedded ODF in the PDF. So... distribute the PDF, and it works as usual. Open that hybrid PDF in OOo and you have the original ODF used to create the document. OK, this doesn't work with existing PDFs, but is an option for new PDFs if they are created as hybrid PDF in OOo. C. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Donald D Henson wrote:
Does anyone know of an open source application that will convert PDF into something more useful. Any leads appreciated.
Something more useful? PDF is one of the more useful formats ever! That said, if you have some particular need in mind you should really tell what you need, instead of making such erroneous generalizations as you did. Regards -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (12)
-
Boris Epstein
-
Clayton
-
D. McGarrett
-
Donald D Henson
-
Duaine & Laura Hechler
-
Jim Henderson
-
Ken Schneider
-
Lew Wolfgang
-
Miguel Medalha
-
Neil
-
Randall R Schulz
-
Tony Alfrey