It seems like Gumbo.jl does a good job of parsing HTML, but there doesn’t seem to be an easy way to extract the text from it (akin to beautiful soup in python).
Has anyone encountered this problem? How did you turn a whole
HTMLDocument into a text string?
Cobbled together this code that kind of does what I want?
string_parts = 
for elem in PreOrderDFS(aaa.root)
isa(elem, HTMLText) || continue
return join(string_parts, " ")