Text mining directly from HTML?

This link helped me to begin to understanding some methods.

1 Like