Deinterleave data

I am de-interleaving data that has been stored as a vector. I would like to store the data as a matrix. I have tried and performance tested 3 methods, and am looking for any comments on how to improve it.

And example is:

julia> a = collect(1:20.0);

julia> deinterleavef(a, 4, 5)
5×4 Matrix{Float64}:
  1.0   2.0   3.0   4.0
  5.0   6.0   7.0   8.0
  9.0  10.0  11.0  12.0
 13.0  14.0  15.0  16.0
 17.0  18.0  19.0  20.0

I generated 3 functions to determine the relative performance. These functions can be classified as reshape/permutedims, slice, and for loop and are shown as follows:

""" 
    function deinterleaver(data, num_channels, num_samples_per_channel)

Convert interleaved vector to Matrix using reshape and permutedims (probably an expensive way to do it)
"""
function deinterleaver(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat
    
    if length(data) == num_channels * num_samples_per_channel
        data = reshape(data, num_channels, num_samples_per_channel)
        data = permutedims(data)
    else
        error("Dimension mismatch")
    end
 
    return data
end

"""
    function deinterleaves(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

Slice the data (not sure of the cost of the slicing)
"""
function deinterleaves(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

    num_samples = length(data)
    # @show(num_channels, num_samples_per_channel, num_samples)
    if num_samples == num_channels * num_samples_per_channel
        datamatrix = similar(data, num_samples_per_channel, num_channels)
        for c in 1:num_channels
            datamatrix[:,c] = data[c:num_channels:num_samples]
        end
    else
        error("Dimension mismatch")
    end
 
    return datamatrix
end


"""
    function deinterleavef(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

Use for loops
"""
function deinterleavef(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat

     num_samples = length(data)
    if num_samples == num_channels * num_samples_per_channel
        datamatrix = similar(data, num_samples_per_channel, num_channels)
        for c in 1:num_channels
            @inbounds for d in 1:num_samples_per_channel
                datamatrix[d,c] = data[c + (d-1)*num_channels]
            end
        end
    else
        error("Dimension mismatch")
    end
 
    return datamatrix
end

Running this on an 8 G Raspberry PI 4 I obtain the following performance metrics

julia> a = collect(1:2_000_000.0);

julia> @btime deinterleaver(a, 4, Integer(round(length(a)/4)));
  56.920 ms (8 allocations: 15.26 MiB)

julia> @btime deinterleaves(a, 4, Integer(round(length(a)/4)));
  105.602 ms (14 allocations: 30.52 MiB)

julia> @btime deinterleavef(a, 4, Integer(round(length(a)/4)));
  57.043 ms (6 allocations: 15.26 MiB)

and waiting a while showing probably thermal effects on the CPU

julia> @btime deinterleavef(a, 4, Integer(round(length(a)/4)));
  56.732 ms (6 allocations: 15.26 MiB)

It seems like a toss-up for the reshape/permutedims or the for loop implementations which are about twice as fast and half the memory of the slicing implementation. The for loop has opportunity for parallelization.

What are the optimizations still possible, or is there another method that I am missing?

Change this to @views function deinterleaves(...) to avoid making a copy when slicing data in that function.

Note that you can just use length(a) ÷ 4, which is a synonym for div(length(a), 4), to get an integer result without this rigamarole. (No need for rounding since your code only works if length(a) is divisible by 4 anyway. If you needed to round, you could also use round(Int, length(a)/4).)

2 Likes

Have you considered

deinterleave(a, nc, ns) = transpose(reshape(a, nc, ns))

which just creates a view, and doesn’t make a copy of the data at all?

3 Likes

I have done some additional trials using these suggestions, again on the Raspberry PI.

The new code is:

function deinterleavesj(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat
   if length(data) == num_channels * num_samples_per_channel
       datamatrix = transpose(reshape(a, num_channels, num_samples_per_channel));
   else
       error("Dimension mismatch")
   end
   return datamatrix
end

function deinterleaverg(data::Vector{T}, num_channels::Integer, num_samples_per_channel::Integer) where T <: AbstractFloat
    if length(data) == num_channels * num_samples_per_channel
        datamatrix = transpose(reshape(view(a, :), num_channels, num_samples_per_channel));
    else
        error("Dimension mismatch")
    end
    return datamatrix
 end
 

giving the following performance benchmarks:

julia> @btime deinterleavesj(a, 4, length(a) ÷ 4 );
  729.622 ns (6 allocations: 96 bytes)

julia> @btime deinterleaverg(a, 4, length(a) ÷ 4 );
  1.252 μs (6 allocations: 120 bytes)

This gives close to a 100x speed improvement with minimal use of memory.

Thanks for the pointers!

You don’t need this check, because reshape already checks this. That’s why I suggested simply:

deinterleave(a, nc, ns) = transpose(reshape(a, nc, ns))

Or, if you want to add some types (which don’t help performance at all! … they are effectively just a form of documentation):

deinterleave(a::AbstractVector, nc::Integer, ns::Integer) = transpose(reshape(a, Int(nc), Int(ns)))

(I added the Int conversions because reshape requires Int arguments.) Even simpler, you could do:

deinterleave(a::AbstractVector, nc::Integer) = transpose(reshape(a, Int(nc), :))

since reshape will calculate the second dimension for you.

Thanks Stevengj,

I played a bit more more with the simplified versions.

I now have your suggested version, much simplified and faster than any previous version I had.

 deinterleave(data::AbstractVector, nc::Integer) = transpose(reshape(data, nc, :))

I left out the Int on the nc on the rhs because I figure the Integer on the lhs takes care of it.

The performance comparison I now get is:

julia> @btime b = transpose(reshape(a, 4, length(a) ÷ 4));
  542.957 ns (5 allocations: 88 bytes)

julia> @btime b = transpose(reshape(a, 4, :));
  341.936 ns (3 allocations: 72 bytes)

julia> @btime b = deinterleave(a, 4);
  274.154 ns (3 allocations: 72 bytes)

julia> @btime b = deinterleave(a, 4);
  279.889 ns (3 allocations: 72 bytes)

When I explicitly give the nr in the reshape command it is significantly slower. I found this curious. And when I put the command in the deinterleave function it is significantly faster. I assume this is the same reason that the documentation states to put code in a function rather than in the global space.

My math was poor in a previous post, this is about 200,000 times faster than the for loop version. Really impressive!

One more test, using permutedims instead of transpose:

deinterleavep(data::AbstractVector, nc::Integer) = permutedims(reshape(data, nc, :))

With the following speed

julia> @btime b = deinterleavep(a, 4);
  56.773 ms (4 allocations: 15.26 MiB)

This puts us back in the same time range as the for loop version.

Nope.

julia> reshape([1,2,3], Int8(1), :)
ERROR: MethodError: no method matching reshape(::Vector{Int64}, ::Int8, ::Colon)
Closest candidates are:
...

If you you declare nc::Integer, it restricts nc only to be any subtype of Integer, but the reshape function specifically requires Int.

permutedims makes a copy of the data, whereas transpose just makes a kind of “view” object.

If you are timing tiny computations like this (even if the matrix a is huge, transpose(reshape(...)) doesn’t actually touch the data), you really need to interpolate global variables like a. Don’t benchmark with non-interpolated globals.

Not impressive at all, once you realize that it is not actually touching the data — transpose(reshape(...)) is an O(1) construction of a view object, which takes a tiny amount of time independent of the size of the data (it doesn’t access the contents of the array at all).

In contrast, the looping and permutedims versions are actually copying the data, which means that they are O(n) in the length n of the data.

2 Likes

Steven,

One more question. You obviously have a much deeper knowledge of programming in general and Julia in particular than I.

For someone that does not have that level of knowledge, what steps would you take to poke around in the help text and code to determine that transpose is much more effective than permutedims, for this problem for instance.

I have seen some clever ways to very quickly navigate to the source code, which I attempted to commit to memory, but it didn’t stick, so some little tricks like that would be useful.

And finally is there a way perhaps that the online help could put an O(1) or O(n) section in. If this was there then it would have been obvious to use transpose instead of permutedims.

Thanks for all of your assistance in this. And Happy Thanksgiving. We had our Thanksgiving celebrations in Canada a month ago.

Edit: I looked at the documentation again and transpose help clearly says it is a lazy transpose and premutedims help refers to PermutedDimsArray, and its help in turn says that permutedims does a copy.

And this led me to try PermutedDimsArray formulation as follows:

deinterleavep(data::AbstractVector, nc::Integer) = PermutedDimsArray(reshape(data, Int(nc), :), (2,1))

julia> @btime b = deinterleavep(a, 4);
  2.276 μs (6 allocations: 200 bytes)

which is slower by about a factor of 10 compared to using transpose.