Mailinglist Archive: opensuse-programming (16 mails)
| < Previous | Next > |
Re: [opensuse-programming] extracting text from html
- From: justin finnerty <linuxchem@xxxxxxxxxxxx>
- Date: Tue, 25 May 2010 15:33:42 +0000 (GMT)
- Message-id: <820591.6486.qm@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
I need to extract text from html for purposes ofindexing -
implementation language is C or C++
I would use a SAX parser that handles HTML (libxml2?). Then all you might need
to do is handle the TEXT nodes.
Cheers
Justin
--
To unsubscribe, e-mail: opensuse-programming+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse-programming+help@xxxxxxxxxxxx
| < Previous | Next > |