Reading HTML file for parsing

I want to read html file and then parse it (not sure if use word “parse” correctly).

I saved example.com to file example.html
Also using EzXML library

using Cascadia, Gumbo, HTTP,AbstractTrees
using EzXML

r = EzXML.readhtml("example.html")

print(r)


Prints result (html):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> Example Domain
<meta charset="utf-8"/>
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>    <meta name="viewport" content="width=device-width, initial-scale=1"/>
<style type="text/css"><![CDATA[
body {
    background-color: #f0f0f2;
    margin: 0;
    padding: 0;
    font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;     

}
div {
    width: 600px;
    margin: 5em auto;
    padding: 2em;
    background-color: #fdfdff;
    border-radius: 0.5em;
    box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
}
a:link, a:visited {
    color: #38488f;
    text-decoration: none;
}
@media (max-width: 700px) {
    div {
        margin: 0 auto;
        width: auto;
    }
}
]]></style>

Example Domain

This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

More information...

How to work further with this html ?

h = parsehtml(String(r.body))

Gives error:
ERROR: type Document has no field body

Cascadia commands does not work too.

EzXML.readhtml() reads simple html files, but give errors on more complex files.

What library should I use ?
Or I missed some steps ?

Thanks

You seem to be importing Gumbo.jl and Cascadia.jl in your code. So I don’t really understand why you are using EzXML’s parser? Can’t you directly use Gumbo’s parsehtml method?

XML based parsers such as EzXML often have difficulty working with actual html.