Convention/package for chunking array stream?



Is there a convention or package for reading/writing in an array by chunks? I understand I could do it myself piecewise using read, but it seems that this could quickly become complicated when taking into account multiple file types, element types being read (e.g., colors, etc). Maybe something like DataStreams but generalized to multidimensional arrays.


I’ve found Blobs.jl useful in the past.


SampledSignals.jl defines semantics for reading/writing multichannel chunks of audio samples. The basic idea is that you have SampleSource and SampleSink abstract types that you can read and write from, respectively. These chunks are represented as SampleBufs (though if you don’t want to store samplerate then Array{T} would work just as well). When you open an audio file or device, you get a concrete subtype that knows how to do the encoding/decoding.

@samoconnor has done some really nice work reviewing the conventions for read/write APIs in Base:

Then there’s also the question of whether you really want to expose your streams with a read/write API like these or whether it makes more sense to use iterate over the elements, or a more FRP-style API using something like Observables.jl or Signals.jl (there are maybe others, too).

Historically in the audio world software works in chunks for efficiency, particularly when signal graphs get wired together dynamically with function pointers, so you don’t want to pay the indirection overhead on every sample. With Julia it may not be necessary to operate chunk-wise, because the per-sample operations can be coalesced and optimized jointly.


I have a pretty limited knowledge of the breadth of IO streams so this is all very helpful.

This is really interesting, as I’ve seen several packages trying to do something similar. This seems to be the easiest approach for streaming simple structs I’ve seen so far.

This is really interesting as I’ve run into questions about these issues myself. Would it be fair to say that much of the IO behavior in Julia is in flux and expected to have sizeable changes in the near future?

Coming into this I had envisioned something where the IO stream could be handled more like an AbstractArray type with something like getindex for subsetting. I know that there were slightly similar APIs in those packages mentioned but I’d probably need something that preserves aspects of dimensionality while chunking a stream.


So after investigating what is out there I put something together.

It’s not ready for use, as I have no tests, package structure, or documentation. The workhorse is ArrayStream which is a subtype of AbstractArray. This allows some pretty convenient things when wrapped with an AxisArray or ImageMeta.