[ANN] Onda.jl: A format for multi-sensor, multi-channel, LPCM-encodable recordings

jrevels · January 3, 2020, 1:23pm

When working with data at this scale, computations do indeed usually need some form (and usually multiple forms) of caching!

However, the most appropriate cache is depends on your computation/access pattern, which is orthogonal to (or at least overtop of) Onda itself as a format. At Beacon, for example, we have some workloads that read the same segments repeatedly, but others that sweep through all segments and load each only once. Onda.jl makes it easy/possible to read segments by TimeSpan but (rightfully, IMO) isn’t opinionated w.r.t. caching.

We do have a nice little LRU cache implementation that auto-spills to disk, though - should probably upstream that to LRUCaches.jl or somewhere Would love to see more cache utilities on top of - but not within - Onda.jl to facilitate common access patterns (or better, composing with Onda.jl without needing to explicitly depend on it).

For my usage annotations wouldn’t be that excessive. I’ve the feeling that if you are talking about this kind of numbers (>100000) the annotations are either outputs of some kind of algorithm or a signal of its own. From my point of view, both shouldn’t be handled as annotations.

Yup, this is how we treat it; for us, at least, the natural thing is to treat “dense annotations” as signals in their own right.

with additional event dictionary

There’s no canonical specification of this in the format, though you could roll your own however you’d like. Defining a canonical spec for this might not be a bad idea, though - this what I meant by “perhaps it wouldn’t be a bad idea for the format to define a structure for categorical sample_unit s…”.

Onda is a layer above already stored files of different formats

Yes, as well as a format for structuring signal metadata + data model that allows you treat all those files similarly as LPCM signals encoded in their own ways.

Do you use some file database with tagging / multiple grouping functionality?

Right now we’re just using S3 for object storage; ingest of new Onda datasets is then just a matter of shoving the metadata into a database (that indexes the S3 objects) and the sample data into S3.

Or its compatible with HDF5 and can be saved in it?

An Onda dataset is just a directory with a fairly simple structure, so I don’t see why it couldn’t be saved in HDF5 AFAICT. The comparison paragraph linked above intends to explain why Onda as format isn’t defined on top of HDF5, but I should update it to make it clear that Onda isn’t incompatible with HDF5 (purposefully, anyway). EDIT ref clarify Onda's relationship with HDF5 by jrevels · Pull Request #12 · beacon-biosignals/OndaFormat · GitHub

Topic		Replies	Views
EEG.jl -> Present and Future New to Julia question , package	44	3331	August 18, 2021
[ANN] OndaBatches.jl: Continuous and distributed batching for Onda-formatted datasets Package Announcements	0	378	February 13, 2023
Can't read old JLD2 file Tooling	17	3053	February 19, 2019
A future for JLD2? Community jld2	56	10039	July 19, 2020
Proposal: working with larger than memory data in hdf5 format using HDF5Arrays (implementation of DiskArrays.jl for HDF5) Data hdf5	11	1782	November 4, 2020

[ANN] Onda.jl: A format for multi-sensor, multi-channel, LPCM-encodable recordings

Related topics