[ANN] Announcing XML.jl

Announcing a pure Julia XML reader/writer.

It powers KML.jl and I’m finally ready to encourage some more testing from the community.

It’s speed looks decent compared to EzXML.jl, but please open issues if you hit cases of poor performance.

Enjoy!

44 Likes

Well I’ve been telling everyone to use it for like six months :wink:

6 Likes

The first link is not working.

Oops, thanks! Fixed.

1 Like

This looks great!

I am extremely interested in this. Do you happen to have some more detailed examples and could you kindly elaborate a bit on what benefits this has compared to something like EzXML.jl?

I see the mention of it being written purely in Julia etc., perhaps someone could explain to me the benefit of this?

Kind regards

1 Like
  1. I’m greedy. I want Julia to be the best at everything.
  2. Packages that are 100% Julia are easier to debug. I’d much rather have @edit lead me to pure Julia code than e.g. a Base.ccall.
  3. In secure computing environments, it can be a burden to move in external dependencies. If Julia is already installed/on your approved software list, you’re all set.
10 Likes

Oh man, you beat me to it! This looks great. It’s been on my list for a while to have a native-Julia XML parser. Love the lazy functionality. Can’t wait to take it for a spin.

8 Likes

Very cool, it’s great seeing more and more infrastructure packages being added to the Julia ecosystem.

Thanks for the package.

I have been using it already in a few occasions, mostly when working with svgs and it worked great so far.

However, in some cases I felt like the package could be shipped with a bit more utility:

  1. E.g. I found myself often doing things like
   xml = Document()
   path = Element("path")
   push!(xml.root, path)

which feels repetitive. I then implemented helpers to overload :(>>) so that it could be written as

   xml = Document()
   path = Element("path") >> xml.root
  1. Atm setting properties only works with Symbols. However, in svg you often have property names that you can’t write as Symbols, e.g.
path.stroke-width = 1 # doesn't work
path.Symbol("stroke-width") = 1 # doesn't work
setproperty!(path, Symbol("stroke-width"), 1) # works

Again, to cut down syntax I type pirated getproperty/setproperty! for AbstractXMLNode to accept AbstractStrings so that I could write instead just

path."stroke-width" = 1

Are there any plans on adding such utilities to XML.jl?

PS: Just noted that XML.jl is now at v0.2.1 and the last version I used was v0.1.3, so perhaps some of the above functionality was already added?

That’s path.var"stroke-width" = 1 btw.

This is great feedback, thanks!

To your first point: I do have some ideas on making more convenient syntax, but nothing that I’ve settled on. I liked what I did with building HTML in Cobweb so I may borrow some things from there.

To your second point, see @aplavin’s response on using var"stroke-width". You can then use setproperty! if you’re setting attributes programmatically.

1 Like

I still prefer the string overload, but I guess its a matter of taste.

Thank you for the detailed explanation and making the package.

I hope to be able to use the package for some of my needs and hopefully provide some valuable feedback from a user perspective.

Kind regards

1 Like

I’ve also been using it for SVGs! It would be good to put together SVG.jl at some stage.

I have a bunch of messy code for extracting geometries to GeometryBasics.jl types.

1 Like

Sounds fun … but what would it be supposed to do? :laughing:

I mean a .svg file is just a .xml file with lots and lots of options for the elements.

If that’s your idea of fun :stuck_out_tongue_winking_eye:

I found it pretty painful having to learn how svg geometries are stored just to scrape some map polygons from a pdf. Things like finding and applying transformation matrices, how the geometry syntax works…

Like

function to_geoms(::Type{T}, p) where T
    matrix = get_transform(p)
    geoms = T[]
    parts = split(p.d, " ")
    local line = Point2{Float64}[]
    s = nothing
    while true
        i = isnothing(s) ? iterate(parts) : iterate(parts, s)
        if isnothing(i) # end of iteration 
            if length(line) > 1
                push!(geoms, T(line))
            end
            break
        end
        x, s = i
        if x == "M"
            line = Point2{Float64}[]
            a, s = iterate(parts, s)
            b, s = iterate(parts, s)
            push!(line, parsepoint(matrix, a, b))
        elseif x == "L"
            a, s = iterate(parts, s)
            b, s = iterate(parts, s)
            push!(line, parsepoint(matrix, a, b))
        elseif x == "Z"
            push!(line, first(line))
            if length(line) > 1
                push!(geoms, T(line))
            end
        end
    end
    geoms
end

function parse_matrix(s)
    m = parse.(Float64, split(s[8:end-1], ","))
    [m[1] m[3] m[5];
     m[2] m[4] m[6];
     0.0  0.0  0.1 ;
    ]
end

Oh, I see. Yeah, that’s not so fun.

And for your application you really only care about the geometric aspects of the svg doc, so that you can ignore all the other metadata like color, links etc?

I called it fun, because I only dealt with it in the other way by writing instead of parsing .svgs for GitHub - fatteneder/SVGMakie.jl: SVG backend for Makie (its just a proof of concept), and that’s a bit simpler :slight_smile:

Yeah writing is way simpler! There are a bunch of arcane options you need to cover to actually read them, and I still only covered what was actually in the files I was scraping.

My svgs had a mix of regular polygons and polygons that needed transformation to the projection. I did care about colour as that was the main indicator of what the objects in the maps were. But you could extract all of that data to a FeatureCollection/Tables object that was GeoInterface.jl compatible and it would just work with all the other geometry packages.

And if you have write already from GeometryBasics.jl geoms its very little work to make that just write any Geometries at all, like GeoInterface.convert(GeometryBasics, geom) little work. So maybe you could extract that SVG writing code from SVGMakie.jl too?

1 Like

Don’t know if applicable but I once extracted a bunch of coordinates from a PDF using PDFIO.jl and GMT.jl. See this forum post.

There are a bunch of arcane options you need to cover to actually read them

I guess you are refering to things like hrefs etc. I can imagine those being nasty to extract correctly without XML.jl.

But you could extract all of that data to a FeatureCollection/Tables object that was GeoInterface.jl compatible and it would just work with all the other geometry packages.

That sounds like a good plan to dump all the metadata for which you don’t know how to handle it.

So maybe you could extract that SVG writing code from SVGMakie.jl too?

IIRC all I needed to write svgs for Makie were just path and polygon elements and then a ton of color and fill options, because all of Makie’s backends mostly only receive lists of points (there is only one occurrence of GeometryBasics.Mesh in all of SVGMakie.jl :laughing: ). So probably not too much to extract, unfortunately.