Re: [opensuse-programming] extracting text from html

25 May 2010


      * Per Jessen  [05-25-10 10:49]:
...
I need to extract text from html for purposes of indexing -
implementation language is C or C++.  Sofar I've come across html2text
which is written in C++ - it looks pretty good, but I will need to make
some changes to make it fit my prposes.  Does any other library come to
mind for extracting text from html?
w3m -dump <url>

lynx has a similar function

-- 
Patrick Shanahan         Plainfield, Indiana, USA        HOG # US1244711
http://wahoo.no-ip.org     Photo Album:  http://wahoo.no-ip.org/gallery2
Registered Linux User #207535                    @ http://counter.li.org
-- 
To unsubscribe, e-mail: opensuse-programming+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse-programming+help@opensuse.org