j
k
j a
j l
...I need to extract text from html for purposes of indexing - ...implementation language is C or C++
I need to extract text from html for purposes of
indexing -
implementation language is C or C++
I would use a SAX parser that handles HTML (libxml2?). Then all you might need to do is handle the TEXT nodes.
Cheers Justin
Back to the thread
Back to the list