Why does SampledSignals.jl store a 2-channel audio as a 2-column Matrix rather than a Vector?

sadish-d · September 26, 2023, 9:21pm

Since each sample from each channel is supposed to be read/played together, I would think that it makes sense to put samples for one channel in the odd indices of a vector, and samples from the other channel in the even indices.

so instead of the existing 2-column matrix:

sample1-channel1 sample1-channel2
sample2-channel1 sample2-channel2

we would get the alternative vector:

sample1-channel1
sample1-channel2
sample2-channel1
sample2-channel2

In the existing structure, you’d have to go through all the channel-1 samples to get to the first sample in channel 2. Doesn’t seem ideal if the audio data is long (has many rows).

I think other audio software do it the second way (though I don’t know much about signal processing and don’t have a reference).

Maybe figuring out odd/even indices is too costly?

Benny · September 27, 2023, 1:35am

No you don’t, you can use the dimensions of the array to jump directly to any multidimensional index in constant time.

Not always. Putting audio in multidimensional arrays with channels separated along an axis is a fairly common way to organize a fixed length of uncompressed audio because it helps people access each channel as easily you could access each sample. The axes are usually chosen to allow each channel to be contiguous (columns in Julia) so that SIMD and caching may help improve performance of audio processing.

Storing each sample contiguously on the other hand is useful when you don’t have a fixed length of audio and could append more samples; you could hypothetically do with a 2D array, but usually only 1D arrays are allowed to have unfixed sizes so that the element type matches the smallest appendable datum. Accessing a particular channel would be relatively more difficult and less efficient, but it’s a worthy tradeoff.

sadish-d · September 27, 2023, 2:37am

Hm. I just don’t understand how arrays work then. (And if this is off-topic or more suitable to post somewhere else, let me know.)

If it takes the same amount time to read from any index in a multi-dimensional array, why is it recommended when you loop through all elements in an array that you go through all elements in a column first and then move to the next column as opposed to looping thorugh all elements in each row and them moving to the next row?

Benny · September 27, 2023, 3:20am

That’s not the same as what I said,

and it is related to optimizations for locality of reference.

So an array is a data structure where each element has the same size in some way. That can remain true if each element represents an object with unfixed size, the array would just hold fixed-size pointers to those objects. When each element has the same size, you don’t need to iterate to reach an index, that was what I meant earlier. Instead, you calculate some variation of offset + index*element_size and jump to that location. With multidimensional indices, you just add more terms ... + index2*axis2_size + .... Therefore, calculating and jumping to an index has constant-time complexity, in other words the performance does not depend on array size or randomized index value.

However, it does depend on other factors. An important one is CPU caching. It’s pretty complicated and I don’t fully understand it, but basically CPUs copy a piece of main memory to a smaller (and literally closer) location that can be read faster. If we access memory that hasn’t been copied into the cache, it’s called a cache miss and the CPU has to copy things from main memory again. So, when we iterate an array, we can benefit from caching if we index it contiguously. Thankfully we have eachindex, linear indexing, and broadcasting in Julia to abstract that chore away in most cases, but it’s good to remember for the cases where an algorithm calls for nested loops over the axes.

sadish-d · September 27, 2023, 5:20am

When each element has the same size, you don’t need to iterate to reach an index, that was what I meant earlier. Instead, you calculate some variation of offset + indexelement_size and jump to that location. With multidimensional indices, you just add more terms … + index2axis2_size + …

This makes sense.

The locality of reference wikipedia page you shared gives an example of matrix multiplication using loops. It says:

By switching the looping order… the speedup in large matrix multiplications becomes dramatic… In this case, “large” means… enough addressable memory such that the matrices will not fit in L1 and L2 caches.

So then the penalty for not storing elements sequentially kicks in only when the data is too big to fit in the cache.

Benny · September 27, 2023, 7:23am

That seems right. Caching has more layers than I have described, so check out the blogpost What scientists must know about hardware to write fast code, specifically the section Avoid cache misses. It’s still not the whole complicated picture, but it’s worth reading to understand a bit more about why the rule of thumb is contiguous indexing.

ericphanson · September 27, 2023, 9:35am

If it stored channels in rows instead of columns, the data layout would be basically what you proposed in the OP, since Julia is a column-major language, meaning it represents 2d arrays in memory by stacking the columns on top of each other. This is what Onda.jl does.

sadish-d · September 27, 2023, 1:18pm

I think I’ve come across that post, but thanks for pointing it out. I need to revisit things a few times before it starts to sink in.

sadish-d · September 27, 2023, 1:21pm

Yes, saving each sample in a column and a channel in a row would do the same thing.

Didn’t know about Onda.jl. Thanks for sharing.

Topic		Replies	Views
Quick Longitudinal Access to arrays General Usage	34	2273	March 3, 2018
Memory best practices in Julia with Arrays vs Vectors New to Julia memory-allocation , arrays , staticarrays , simulations	19	871	August 18, 2024
Why column major? General Usage question , array , linearalgebra , column-major	59	19279	February 17, 2024
Performance difference when accessing square matrix rows-first or cols-first Performance	14	1730	April 13, 2021
1d arrays vs 1-column matrices New to Julia question	7	2754	September 5, 2020

Why does SampledSignals.jl store a 2-channel audio as a 2-column Matrix rather than a Vector?

Related topics