 # Deinterleave data

I am de-interleaving data that has been stored as a vector. I would like to store the data as a matrix. I have tried and performance tested 3 methods, and am looking for any comments on how to improve it.

And example is:

``````julia> a = collect(1:20.0);

julia> deinterleavef(a, 4, 5)
5×4 Matrix{Float64}:
1.0   2.0   3.0   4.0
5.0   6.0   7.0   8.0
9.0  10.0  11.0  12.0
13.0  14.0  15.0  16.0
17.0  18.0  19.0  20.0
``````

I generated 3 functions to determine the relative performance. These functions can be classified as reshape/permutedims, slice, and for loop and are shown as follows:

``````"""
function deinterleaver(data, num_channels, num_samples_per_channel)

Convert interleaved vector to Matrix using reshape and permutedims (probably an expensive way to do it)
"""
function deinterleaver(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

if length(data) == num_channels * num_samples_per_channel
data = reshape(data, num_channels, num_samples_per_channel)
data = permutedims(data)
else
error("Dimension mismatch")
end

return data
end

"""
function deinterleaves(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

Slice the data (not sure of the cost of the slicing)
"""
function deinterleaves(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

num_samples = length(data)
# @show(num_channels, num_samples_per_channel, num_samples)
if num_samples == num_channels * num_samples_per_channel
datamatrix = similar(data, num_samples_per_channel, num_channels)
for c in 1:num_channels
datamatrix[:,c] = data[c:num_channels:num_samples]
end
else
error("Dimension mismatch")
end

return datamatrix
end

"""
function deinterleavef(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

Use for loops
"""
function deinterleavef(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

num_samples = length(data)
if num_samples == num_channels * num_samples_per_channel
datamatrix = similar(data, num_samples_per_channel, num_channels)
for c in 1:num_channels
@inbounds for d in 1:num_samples_per_channel
datamatrix[d,c] = data[c + (d-1)*num_channels]
end
end
else
error("Dimension mismatch")
end

return datamatrix
end

``````

Running this on an 8 G Raspberry PI 4 I obtain the following performance metrics

``````julia> a = collect(1:2_000_000.0);

julia> @btime deinterleaver(a, 4, Integer(round(length(a)/4)));
56.920 ms (8 allocations: 15.26 MiB)

julia> @btime deinterleaves(a, 4, Integer(round(length(a)/4)));
105.602 ms (14 allocations: 30.52 MiB)

julia> @btime deinterleavef(a, 4, Integer(round(length(a)/4)));
57.043 ms (6 allocations: 15.26 MiB)

and waiting a while showing probably thermal effects on the CPU

julia> @btime deinterleavef(a, 4, Integer(round(length(a)/4)));
56.732 ms (6 allocations: 15.26 MiB)
``````

It seems like a toss-up for the reshape/permutedims or the for loop implementations which are about twice as fast and half the memory of the slicing implementation. The for loop has opportunity for parallelization.

What are the optimizations still possible, or is there another method that I am missing?

Change this to `@views function deinterleaves(...)` to avoid making a copy when slicing `data` in that function.

Note that you can just use `length(a) ÷ 4`, which is a synonym for `div(length(a), 4)`, to get an integer result without this rigamarole. (No need for rounding since your code only works if `length(a)` is divisible by 4 anyway. If you needed to round, you could also use `round(Int, length(a)/4)`.)

2 Likes

Have you considered

``````deinterleave(a, nc, ns) = transpose(reshape(a, nc, ns))
``````

which just creates a view, and doesn’t make a copy of the data at all?

3 Likes

I have done some additional trials using these suggestions, again on the Raspberry PI.

The new code is:

``````function deinterleavesj(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat
if length(data) == num_channels * num_samples_per_channel
datamatrix = transpose(reshape(a, num_channels, num_samples_per_channel));
else
error("Dimension mismatch")
end
return datamatrix
end

function deinterleaverg(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat
if length(data) == num_channels * num_samples_per_channel
datamatrix = transpose(reshape(view(a, :), num_channels, num_samples_per_channel));
else
error("Dimension mismatch")
end
return datamatrix
end

``````

giving the following performance benchmarks:

``````julia> @btime deinterleavesj(a, 4, length(a) ÷ 4 );
729.622 ns (6 allocations: 96 bytes)

julia> @btime deinterleaverg(a, 4, length(a) ÷ 4 );
1.252 μs (6 allocations: 120 bytes)
``````

This gives close to a 100x speed improvement with minimal use of memory.

Thanks for the pointers!

You don’t need this check, because `reshape` already checks this. That’s why I suggested simply:

``````deinterleave(a, nc, ns) = transpose(reshape(a, nc, ns))
``````

Or, if you want to add some types (which don’t help performance at all! … they are effectively just a form of documentation):

``````deinterleave(a::AbstractVector, nc::Integer, ns::Integer) = transpose(reshape(a, Int(nc), Int(ns)))
``````

(I added the `Int` conversions because `reshape` requires `Int` arguments.) Even simpler, you could do:

``````deinterleave(a::AbstractVector, nc::Integer) = transpose(reshape(a, Int(nc), :))
``````

since `reshape` will calculate the second dimension for you.

Thanks Stevengj,

I played a bit more more with the simplified versions.

I now have your suggested version, much simplified and faster than any previous version I had.

`````` deinterleave(data::AbstractVector, nc::Integer) = transpose(reshape(data, nc, :))
``````

I left out the Int on the nc on the rhs because I figure the Integer on the lhs takes care of it.

The performance comparison I now get is:

``````julia> @btime b = transpose(reshape(a, 4, length(a) ÷ 4));
542.957 ns (5 allocations: 88 bytes)

julia> @btime b = transpose(reshape(a, 4, :));
341.936 ns (3 allocations: 72 bytes)

julia> @btime b = deinterleave(a, 4);
274.154 ns (3 allocations: 72 bytes)

julia> @btime b = deinterleave(a, 4);
279.889 ns (3 allocations: 72 bytes)
``````

When I explicitly give the nr in the reshape command it is significantly slower. I found this curious. And when I put the command in the deinterleave function it is significantly faster. I assume this is the same reason that the documentation states to put code in a function rather than in the global space.

My math was poor in a previous post, this is about 200,000 times faster than the for loop version. Really impressive!

One more test, using permutedims instead of transpose:

``````deinterleavep(data::AbstractVector, nc::Integer) = permutedims(reshape(data, nc, :))
``````

With the following speed

``````julia> @btime b = deinterleavep(a, 4);
56.773 ms (4 allocations: 15.26 MiB)
``````

This puts us back in the same time range as the for loop version.

Nope.

``````julia> reshape([1,2,3], Int8(1), :)
ERROR: MethodError: no method matching reshape(::Vector{Int64}, ::Int8, ::Colon)
Closest candidates are:
...
``````

If you you declare `nc::Integer`, it restricts `nc` only to be any subtype of `Integer`, but the `reshape` function specifically requires `Int`.

`permutedims` makes a copy of the data, whereas `transpose` just makes a kind of “view” object.

If you are timing tiny computations like this (even if the matrix `a` is huge, `transpose(reshape(...))` doesn’t actually touch the data), you really need to interpolate global variables like `a`. Don’t benchmark with non-interpolated globals.

Not impressive at all, once you realize that it is not actually touching the data — `transpose(reshape(...))` is an O(1) construction of a view object, which takes a tiny amount of time independent of the size of the data (it doesn’t access the contents of the array at all).

In contrast, the looping and `permutedims` versions are actually copying the data, which means that they are O(n) in the length `n` of the data.

2 Likes

Steven,

One more question. You obviously have a much deeper knowledge of programming in general and Julia in particular than I.

For someone that does not have that level of knowledge, what steps would you take to poke around in the help text and code to determine that transpose is much more effective than permutedims, for this problem for instance.

I have seen some clever ways to very quickly navigate to the source code, which I attempted to commit to memory, but it didn’t stick, so some little tricks like that would be useful.

And finally is there a way perhaps that the online help could put an O(1) or O(n) section in. If this was there then it would have been obvious to use transpose instead of permutedims.

Thanks for all of your assistance in this. And Happy Thanksgiving. We had our Thanksgiving celebrations in Canada a month ago.

Edit: I looked at the documentation again and transpose help clearly says it is a lazy transpose and premutedims help refers to PermutedDimsArray, and its help in turn says that permutedims does a copy.

And this led me to try PermutedDimsArray formulation as follows:

``````deinterleavep(data::AbstractVector, nc::Integer) = PermutedDimsArray(reshape(data, Int(nc), :), (2,1))

julia> @btime b = deinterleavep(a, 4);
2.276 μs (6 allocations: 200 bytes)
``````

which is slower by about a factor of 10 compared to using transpose.