[ANN] Lexbor.jl - HTML parser wrapping the C library lexbor

Announcing Lexbor.jl, which provides HTML parsing and DOM queries via CSS selector syntax. Docs can be found at https://michaelhatherly.github.io/Lexbor.jl.

This package is mostly just a wrapper around the GitHub - lexbor/lexbor: Lexbor is development of an open source HTML Renderer library. https://lexbor.com C library. I’ve not wrapped the entire C interface, since I wasn’t in need of much of it, but that does not rule out implementing more of their API should there be interest in it.

The code was originally part of GitHub - MichaelHatherly/HypertextTemplates.jl: Hypertext templating DSL for Julia but was refactored out of that package once it wasn’t needed there. Rather than letting it disappear into git history it’s now available in the General registry for anyone that might need it.

8 Likes

The API looks pretty straightforward. I wonder how it compares to Cascadia.jl which was the first library I encountered way back when I first tried to do DOM queries with CSS selectors in Julia.

Yes, Cascadia.jl provides a similar set of features, though relies on Gumbo.jl, which according to the upstream C lib that it uses https://github.com/google/gumbo-parser:

This project has been unmaintained since 2016 and should not be used.

So it’ll just depend on whether a user is willing to use an unmaintained dependency or not for their particular use case, sometimes it doesn’t matter, sometimes it does :slight_smile: This was the main reason I initially wrapped a new lib rather than just using what already exists.

2 Likes

Hi @mike this seems quite nice. I didn’t fully understand whether this can somehow be used to parse CSS stylesheets from String/file.

Is this something already possible with the current package or some functionality needs to be added (it seems the original C library should be capable of doing so)

It would need to be added to Lexbor.jl. As you say, and from what I understand, it already exists in the C lib. Currently, I do not need CSS parsing, but I would be happy to review any PRs that implement it. Ideally eventually, the entire C API will be supported, but it’s a large task :slight_smile:

1 Like