[ANN] LightBSON.jl

Christian_Rorvik · August 31, 2021, 4:35pm

Borne out of a need for high performance and allocation free BSON processing, announcing LightBSON.jl.

What It Is

Allocation free API for reading and writing BSON data.
Natural mapping of Julia types to corresponding BSON types.
Convenience API to read and write Dict{String, Any} or OrderedDict{String, Any} (default, for roundtrip consistency) as BSON.
Struct API tunable for tradeoffs between flexibility, performance, and evolution.
Configurable validation levels.
Light weight indexing for larger documents.
Transducers.jl compatible.
Tested for conformance against the BSON corpus.

What It Is Not

Generic serialization of all Julia types to BSON. See BSON.jl for that functionality. LightBSON.jl aims for natural representations, suitable for interop with other languages and long term persistence.
Integrated with FileIO.jl. BSON.jl already is, and adding another with different semantics would be confusing.
A BSON mutation API. Reading and writing are entirely separate and only complete documents can be written.
Conversion to and from Extended JSON. This may be added later.

See README for more examples and further information.

Datseris · August 31, 2021, 5:03pm

Not being integrated with FileIO is a hit for interplay with DrWatson’s tagsave, produce_or_load and co. Do you think it is possible to actually implement this? Had many problems with the existing BSON.jl, unlikely to use it in the future, so your package seems like a nice alternative.

oxinabox · August 31, 2021, 9:44pm

Yay.
I am so glad to have an option for plain BSON that doesn’t try and serialize arbitrary Julia objects.
This maximizes compatibility,
and also means it can be a secure format.
(In contrast is is seemingly impossible to make anything like pickle, or the Julia serializer not able to run arbitrary code on loading. This includes BSON.jl, JLD2, and JLSO)

Christian_Rorvik · September 1, 2021, 9:18am

Not being integrated with FileIO is a hit for interplay with DrWatson’s tagsave, produce_or_load and co. Do you think it is possible to actually implement this?

It doesn’t look difficult to implement, but I’m concerned if it would cause confusion with users. From how I understand FileIO to work, it will try each package for a format in the order they were registered, fail if the package can’t be loaded (e.g., not in your project), and try the next one. If I add LightBSON to FileIO it becomes difficult to determine which package will be used to load .bson files outside of the top level project where you have control of your environment. Given the semantic difference in how BSON.jl and LightBSON.jl treat the data, I’m not sure how users would actually handle that.

If there are good solutions to this I’d be happy to add the functionality. You probably have more experience with the ecosystem, so how do you see it best working?

jackyjoy123 · September 5, 2021, 11:44am

Christian_Rorvik:

It doesn’t look difficult to implement, but I’m concerned if it would cause confusion with users. From how I understand FileIO to work, it will try each package for a format in the order they were registered, fail if the package can’t be loaded (e.g., not in your project), and try the next one. If I add LightBSON to FileIO it becomes difficult to determine which package will be used to load .bson files outside of the top level project where you have control of your environment. Given the semantic difference in how BSON.jl and LightBSON.jl treat the data, I’m not sure how users would actually handle that.

If there are good solutions to this I’d be happy to add the functionality. You probably have more experience with the ecosystem, so how do you see it best working?

thanks for the awesome information.

Datseris · September 9, 2021, 2:09pm

Hm, I guess having the ending .lbson for light-bson is not good? because it means the files wouldn’t be able to load with other BSON formats in other programming environments…?

Christian_Rorvik · September 9, 2021, 6:08pm

It would work for the FileIO dispatch, but I think it would be a bit strange to change extension only for that when it’s the same format in the file

oxinabox · September 9, 2021, 6:27pm

I suspect can hand-code a case for Format{:BSON} into FileIO which does a bit of fiddling to guess if it is BSON.jl using extended features (I think there will be a types top-level key or something?)
and if so using BSON.jl, or if not uses LightBSON.jl

Christian_Rorvik · September 9, 2021, 6:33pm

That’s a good suggestion, I can see that working for load. What would you do on the save path?

oxinabox · September 9, 2021, 6:47pm

I have no super bright ideas, maybe look at the data and see if it only contains things that follow the BSON spec?
It is a recursive search; might want a max limit before guessing it doesn’t.

Christian_Rorvik · September 10, 2021, 6:28am

I’m hesitant to implement heuristics that aren’t robust and clear, and LightBSON.jl also lets you write custom types, so instead I’ve added a keyword argument instead to select the saver.

With these additions in LightBSON.jl:

function fileio_save(f, x; kwargs...)
    get(kwargs, :plain, false) || error("Pass plain = true to select LigthBSON.jl over BSON.jl")
    bson_write(f.filename, x)
    nothing
end

function fileio_load(f)
    buf = read(f.filename)
    reader = BSONReader(buf)
    hastag = false
    hastype = false
    foreach(reader) do field
        hastag |= field.first == "tag"
        hastype |= field.first == "type"
        nothing
    end
    hastag && hastype && error("BSON.jl document detected, aborting LightBSON.jl load")
    reader[Any]
end

and registering LightBSON.jl at higher priority than BSON.jl in FileIO, you can pass plain = true to save in plain BSON format, whilst the load path detects automatically as you suggested. Would that work or be made to work with DrWatson?

Topic		Replies	Views
BSONqs.jl v0.5.0 - high speed fork of BSON.jl Package Announcements	8	1543	September 4, 2019
Generating type specific deserialisers for BSON.jl Data package	0	1332	June 26, 2019
ANN: BSON.jl, for saving your Julia data Community	0	1275	February 27, 2018
Workaround for BSON "Inexact: trunc" error for saving very large files? Data	5	1096	November 11, 2020
Error while loading Serialized BSON Machine Learning question	4	60	August 22, 2024

[ANN] LightBSON.jl

What It Is

What It Is Not

Related topics