I want to read html file and then parse it (not sure if use word “parse” correctly).
I saved example.com to file example.html
Also using EzXML library
using Cascadia, Gumbo, HTTP,AbstractTrees
using EzXML
r = EzXML.readhtml("example.html")
print(r)
Prints result (html):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> Example Domain<meta charset="utf-8"/>
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/> <meta name="viewport" content="width=device-width, initial-scale=1"/>
<style type="text/css"><![CDATA[
body {
background-color: #f0f0f2;
margin: 0;
padding: 0;
font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
}
div {
width: 600px;
margin: 5em auto;
padding: 2em;
background-color: #fdfdff;
border-radius: 0.5em;
box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
}
a:link, a:visited {
color: #38488f;
text-decoration: none;
}
@media (max-width: 700px) {
div {
margin: 0 auto;
width: auto;
}
}
]]></style>
Example Domain
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
How to work further with this html ?
h = parsehtml(String(r.body))
Gives error:
ERROR: type Document has no field body
Cascadia commands does not work too.
EzXML.readhtml() reads simple html files, but give errors on more complex files.
What library should I use ?
Or I missed some steps ?
Thanks