Why does SampledSignals.jl store a 2-channel audio as a 2-column Matrix rather than a Vector?

Since each sample from each channel is supposed to be read/played together, I would think that it makes sense to put samples for one channel in the odd indices of a vector, and samples from the other channel in the even indices.

so instead of the existing 2-column matrix:

sample1-channel1 sample1-channel2
sample2-channel1 sample2-channel2

we would get the alternative vector:


In the existing structure, you’d have to go through all the channel-1 samples to get to the first sample in channel 2. Doesn’t seem ideal if the audio data is long (has many rows).

I think other audio software do it the second way (though I don’t know much about signal processing and don’t have a reference).

Maybe figuring out odd/even indices is too costly?

No you don’t, you can use the dimensions of the array to jump directly to any multidimensional index in constant time.

Not always. Putting audio in multidimensional arrays with channels separated along an axis is a fairly common way to organize a fixed length of uncompressed audio because it helps people access each channel as easily you could access each sample. The axes are usually chosen to allow each channel to be contiguous (columns in Julia) so that SIMD and caching may help improve performance of audio processing.

Storing each sample contiguously on the other hand is useful when you don’t have a fixed length of audio and could append more samples; you could hypothetically do with a 2D array, but usually only 1D arrays are allowed to have unfixed sizes so that the element type matches the smallest appendable datum. Accessing a particular channel would be relatively more difficult and less efficient, but it’s a worthy tradeoff.

Hm. I just don’t understand how arrays work then. (And if this is off-topic or more suitable to post somewhere else, let me know.)

If it takes the same amount time to read from any index in a multi-dimensional array, why is it recommended when you loop through all elements in an array that you go through all elements in a column first and then move to the next column as opposed to looping thorugh all elements in each row and them moving to the next row?

That’s not the same as what I said,

and it is related to optimizations for locality of reference.

So an array is a data structure where each element has the same size in some way. That can remain true if each element represents an object with unfixed size, the array would just hold fixed-size pointers to those objects. When each element has the same size, you don’t need to iterate to reach an index, that was what I meant earlier. Instead, you calculate some variation of offset + index*element_size and jump to that location. With multidimensional indices, you just add more terms ... + index2*axis2_size + .... Therefore, calculating and jumping to an index has constant-time complexity, in other words the performance does not depend on array size or randomized index value.

However, it does depend on other factors. An important one is CPU caching. It’s pretty complicated and I don’t fully understand it, but basically CPUs copy a piece of main memory to a smaller (and literally closer) location that can be read faster. If we access memory that hasn’t been copied into the cache, it’s called a cache miss and the CPU has to copy things from main memory again. So, when we iterate an array, we can benefit from caching if we index it contiguously. Thankfully we have eachindex, linear indexing, and broadcasting in Julia to abstract that chore away in most cases, but it’s good to remember for the cases where an algorithm calls for nested loops over the axes.

When each element has the same size, you don’t need to iterate to reach an index, that was what I meant earlier. Instead, you calculate some variation of offset + indexelement_size and jump to that location. With multidimensional indices, you just add more terms … + index2axis2_size + …

This makes sense.

The locality of reference wikipedia page you shared gives an example of matrix multiplication using loops. It says:

By switching the looping order… the speedup in large matrix multiplications becomes dramatic… In this case, “large” means… enough addressable memory such that the matrices will not fit in L1 and L2 caches.

So then the penalty for not storing elements sequentially kicks in only when the data is too big to fit in the cache.

That seems right. Caching has more layers than I have described, so check out the blogpost What scientists must know about hardware to write fast code, specifically the section Avoid cache misses. It’s still not the whole complicated picture, but it’s worth reading to understand a bit more about why the rule of thumb is contiguous indexing.

If it stored channels in rows instead of columns, the data layout would be basically what you proposed in the OP, since Julia is a column-major language, meaning it represents 2d arrays in memory by stacking the columns on top of each other. This is what Onda.jl does.

I think I’ve come across that post, but thanks for pointing it out. I need to revisit things a few times before it starts to sink in.

Yes, saving each sample in a column and a channel in a row would do the same thing.

Didn’t know about Onda.jl. Thanks for sharing.

1 Like