At this point, it looks like you have already managed to extract the relevant content/text from HTML.
Gumbo/Cascadia will not help to get the text into formatted data (since you have raw text, not some HTML table or other elements).
Gumbo.jl conveniently provides the text function that extracts the text from any HTML element. In your scenario, text(Div[1]).
However, this will output a string that is still not yet formatted per your needs - and Gumbo.jl has no helper functions for transforming a raw string into structured data.
A very simple parser for the format above can look like this:
using DataFrames
txt = """
JD (mid) | Telescope | Filter | Exposure (s) | Magnitude (AB) |
----------------------------------------------------------------------
2460115.3875 | OHP-T120 | R | 3900 | 20.70 +/- 0.12 |
2460115.413706 | OHP-T193/MISTRAL | r' | 4560 | 20.84 +/- 0.04 |
2460115.440972 | OHP-T120 | V | 4200 | 20.85 +/- 0.07 |"""
lines = split(txt, "\n")
parseline(line) = strip.(split(line, "|"))[1:end-1]
header = parseline(lines[1])
rows = parseline.(lines[3:end])
d = Dict(k => [getindex(row, i) for row in rows] for (i, k) in enumerate(header))
DataFrame(d)
And will produce something like this:
Now, if the pages contain the same text somewhere in the content, you can create some matching pattern to get the start and the end of the desired text and use something similar to the code above to extract it as a data frame (and finally as CSV).
However, please note that this is beyond Gumbo.jl capabilities.
