Text mining directly from HTML?

This link helped me to begin to understanding some methods.