Mailinglist Archive: opensuse-programming (16 mails)

< Previous Next >
Re: [opensuse-programming] extracting text from html
  • From: justin finnerty <linuxchem@xxxxxxxxxxxx>
  • Date: Tue, 25 May 2010 15:33:42 +0000 (GMT)
  • Message-id: <820591.6486.qm@xxxxxxxxxxxxxxxxxxxxxxxxxxx>

I need to extract text from html for purposes of
indexing -
implementation language is C or C++

I would use a SAX parser that handles HTML (libxml2?). Then all you might need
to do is handle the TEXT nodes.

Cheers
Justin




--
To unsubscribe, e-mail: opensuse-programming+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse-programming+help@xxxxxxxxxxxx

< Previous Next >
List Navigation
Follow Ups
References