Having a common interface for (de)serializing LPCM sample data from arbitrary domain-specific formats (living in arbitrary storage layers - local filesystems, S3, etc.) to a common matrix representation (that can be backed by any <:AbstractMatrix
) is exactly the motivation behind GitHub - beacon-biosignals/Onda.jl: A Julia package for high-throughput manipulation of structured signal data across arbitrary domain-specific encodings, file formats and storage layers The package currently implements a TimeSpan
type and an Annotation
type which wraps it - both can be used to index sample data. Onda’s Paths/Serialization API enables users to efficiently request discontiguous data chunks on per-TimeSpan
basis as long as the underlying storage layer/file format support it (if not, a slower but still correct fallback path is used). In a similar vein, since Onda.Samples
types just wrap AbstractMatrix
s, you can also e.g. just use mmap
on top of raw LPCM blobs.
We’re also highly interested in @shashi’s recent release of FileTrees.jl, which we believe will compose quite well as a compute framework on top of Onda Datasets.