I realize this probably isn’t an issue with
Gumbo.jl but rather with Gumbo itself (unless there’s some switch or something that’s not exposed) – or maybe even the HTML5 parsing algorithm – but I thought I’d see if someone has a not all-too-hacky solution. (A hacky solution won’t be so hard )
The issue is simply that text like
foo<em> </em>bar loses info about the whitespace, so if I extract the text, I end up with
foobar, though browsers render this as
foo bar, which makes sense to me (although the original markup does not make sense; but it’s not mine).
So … is there any way to tell Gumbo to preserve whitespace, for example, so I can check if there actually is space in there or not?