[ANN] LightBSON.jl

Borne out of a need for high performance and allocation free BSON processing, announcing LightBSON.jl.

What It Is

  • Allocation free API for reading and writing BSON data.
  • Natural mapping of Julia types to corresponding BSON types.
  • Convenience API to read and write Dict{String, Any} or OrderedDict{String, Any} (default, for roundtrip consistency) as BSON.
  • Struct API tunable for tradeoffs between flexibility, performance, and evolution.
  • Configurable validation levels.
  • Light weight indexing for larger documents.
  • Transducers.jl compatible.
  • Tested for conformance against the BSON corpus.

What It Is Not

  • Generic serialization of all Julia types to BSON. See BSON.jl for that functionality. LightBSON.jl aims for natural representations, suitable for interop with other languages and long term persistence.
  • Integrated with FileIO.jl. BSON.jl already is, and adding another with different semantics would be confusing.
  • A BSON mutation API. Reading and writing are entirely separate and only complete documents can be written.
  • Conversion to and from Extended JSON. This may be added later.

See README for more examples and further information.

13 Likes

Not being integrated with FileIO is a hit for interplay with DrWatson’s tagsave, produce_or_load and co. :frowning: Do you think it is possible to actually implement this? Had many problems with the existing BSON.jl, unlikely to use it in the future, so your package seems like a nice alternative.

2 Likes

Yay.
I am so glad to have an option for plain BSON that doesn’t try and serialize arbitrary Julia objects.
This maximizes compatibility,
and also means it can be a secure format.
(In contrast is is seemingly impossible to make anything like pickle, or the Julia serializer not able to run arbitrary code on loading. This includes BSON.jl, JLD2, and JLSO)

5 Likes

Not being integrated with FileIO is a hit for interplay with DrWatson’s tagsave, produce_or_load and co. :frowning: Do you think it is possible to actually implement this?

It doesn’t look difficult to implement, but I’m concerned if it would cause confusion with users. From how I understand FileIO to work, it will try each package for a format in the order they were registered, fail if the package can’t be loaded (e.g., not in your project), and try the next one. If I add LightBSON to FileIO it becomes difficult to determine which package will be used to load .bson files outside of the top level project where you have control of your environment. Given the semantic difference in how BSON.jl and LightBSON.jl treat the data, I’m not sure how users would actually handle that.

If there are good solutions to this I’d be happy to add the functionality. You probably have more experience with the ecosystem, so how do you see it best working?

2 Likes

thanks for the awesome information.

Hm, I guess having the ending .lbson for light-bson is not good? because it means the files wouldn’t be able to load with other BSON formats in other programming environments…?

It would work for the FileIO dispatch, but I think it would be a bit strange to change extension only for that when it’s the same format in the file

I suspect can hand-code a case for Format{:BSON} into FileIO which does a bit of fiddling to guess if it is BSON.jl using extended features (I think there will be a types top-level key or something?)
and if so using BSON.jl, or if not uses LightBSON.jl

That’s a good suggestion, I can see that working for load. What would you do on the save path?

I have no super bright ideas, maybe look at the data and see if it only contains things that follow the BSON spec?
It is a recursive search; might want a max limit before guessing it doesn’t.

I’m hesitant to implement heuristics that aren’t robust and clear, and LightBSON.jl also lets you write custom types, so instead I’ve added a keyword argument instead to select the saver.

With these additions in LightBSON.jl:

function fileio_save(f, x; kwargs...)
    get(kwargs, :plain, false) || error("Pass plain = true to select LigthBSON.jl over BSON.jl")
    bson_write(f.filename, x)
    nothing
end

function fileio_load(f)
    buf = read(f.filename)
    reader = BSONReader(buf)
    hastag = false
    hastype = false
    foreach(reader) do field
        hastag |= field.first == "tag"
        hastype |= field.first == "type"
        nothing
    end
    hastag && hastype && error("BSON.jl document detected, aborting LightBSON.jl load")
    reader[Any]
end 

and registering LightBSON.jl at higher priority than BSON.jl in FileIO, you can pass plain = true to save in plain BSON format, whilst the load path detects automatically as you suggested. Would that work or be made to work with DrWatson?