Hello, On Sat, 30 Aug 2014, Anton Aylward wrote:
On 08/30/2014 06:47 AM, David Haller wrote:
On Thu, 28 Aug 2014, Anton Aylward wrote:
Ah yes, the style of pdf that goes
<Position> word <position> word <position> word
and so on.
It also makes them ungrepable!
# zypper ar http://download.opensuse.org/repositories/Publishing/openSUSE_13.1/Publishin... # zypper in pdfgrep
Which proves my point.
Which was not clear.
That program does not grep the file. It renders a page and greps that. It is page oriented. It won't work without poppler.
pdfgrep makes pdfs "greppable". Not _quite_ like grep.
It is *NOT* grepping the source, which the point I was making.
If you want to grep the source, grep the source. If a PDF is built as (position)word or even (position)character (garbled) sequences, that is not the the fault of grep, and pdfgrep can get you past that hurdle. In case of images, you'll need a OCR...
Nevertheless, thank you. This is a useful tool, especially something to integrate with tools, possibly web based, to index "libraries of documents".
Exactly. Which is why I mentioned it :) Someone asked me about pdfgrep, IIRC, and I decided to package it :) I guess I'll have to explicitly request inclusion into Factory to get picked up by the normal Distro (just being in the Devel-Project "Publishing" does not suffice, obviously). -dnh -- My house, my rules. If they ignore the tiny little signs posted outside saying "No arachnids, this means *YOU*, violators will be flattened" it's not my lookout. -- dpm -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org