Selector syntax in the Cascadia package


#1

I’m doing a bit of webscraping, patterned after that in the DataDepsGenerators package, using Gumbo and Cascadia. I don’t quite understand the syntax of the selector expressions in Cascadia and I haven’t been able to find any descriptions, either in Cascadia.jl or in the original package for Go. I want to select elements of the form

  <div id="articleTitle">
  <h3>
        Multimodal character viewpoint in quoted dialogue sequences
  </h3>
  </div>
  <div id="authorString">
      <em>
          Kashmiri Stec, Mike Huiskes, Martijn Wieling, Gisela Redeker
      </em>
  </div>

I can use something like

julia> first(eachmatch(Selector("""[id="authorString"]"""), art144.root))
HTMLElement{:div}:
<div id="authorString">
  <em>
    Kashmiri Stec, Mike Huiskes, Martijn Wieling, Gisela Redeker
  </em>
</div>

but that is just groping around in the dark if I don’t know the syntax.


#2

Okay I am beginning to understand that this is a CSS selector syntax


#3

Have you tried https://github.com/bicycle1885/EzXML.jl