What is the fastest way to turn a 3D array into a vector of 2D arrays?

Hi, I wonder what would be the fastest way to turn a 3D array into a vector of 2D arrays.

Specifically: the input is an array of size size(inp) = (a, b, c) and I want as output a vector (typeof(out) = Array{Array{UInt8,2},1}), of length length(out) = a whose elements are arrays of size (b, c) corresponding to the dimensions of inp along the first axis.

A way to solve this with list comprehension:

using BenchmarkTools

inp = rand(3, 5, 5)
@btime out = [Array{Float64}(inp[i,:,:]) for i in 1:size(inp)[1]]

gives the following output:

1.992 μs (21 allocations: 2.19 KiB)

Is there a way to go faster?

first of all you’re bench marking with global variables, and also you can use a view I guess


julia> @btime out = [Array{Float64}($inp[i,:,:]) for i in 1:size($inp)[1]];
  324.737 ns (7 allocations: 1.80 KiB)

julia> @btime out = [@views($inp[i,:,:]) for i in 1:size($inp)[1]];
  56.988 ns (1 allocation: 224 bytes)
1 Like

In this case, ArraysOfArrays.jl is very useful.

Note that, in the following code, I have moved the first axis of inp to the last for efficiency and in order to use ArraysOfArrays.nestedview.

Code:

using BenchmarkTools, ArraysOfArrays

inp = rand(5, 5, 3)
@show nestedview(inp, 2) == [inp[:, :, k] for k in axes(inp, 3)]
print("Comprehension:          ")
@btime out = [$inp[:, :, k] for k in axes($inp, 3)]
print("Comprehension with @view:")
@btime out = [@view($inp[:, :, k]) for k in axes($inp, 3)]
print("ArraysOfArrays.nestedview:")
@btime out = nestedview($inp, 2);

Output:

nestedview(inp, 2) == [inp[:, :, k] for k = axes(inp, 3)] = true
Comprehension:            177.245 ns (4 allocations: 848 bytes)
Comprehension with @view:  40.202 ns (1 allocation: 208 bytes)
ArraysOfArrays.nestedview:  1.100 ns (0 allocations: 0 bytes)
5 Likes

Thank you both, I really appreciate the speed-up! Indeed my bench marking was not well written, sorry for that.

Assuming your input necessarily has size (3, 5, 5) and one would want to use the ArraysOfArrays.nestedview method, how could we permute the dimensions in a fast way?
I tried with Base.permutedims but the slow-down was so high that the 2nd method, Comprehension with @view, outperforms it.

Not sure if using TensorCast.jl is among the fastest, but in terms of syntax it seems excellent and one can easily permute the indices around:

using TensorCast
@cast out[i][j,k] := inp[i,j,k];   # := returns a view

I think the point is that you should try to structure your data in a way such that it exploits the how data are stored in memory, namely with consecutive elements along the first dimension. Sometimes you have no control over how the data are stored, but if you can provide some background maybe there’s a way to fix it.

Where do you get the data from?

1 Like

Nobody has mentioned eachslice(inp, dims=1) which is Base’s solution.

This is an iterator of views, you can write collect(eachslice(inp, dims=1)) to make an array of views, or map(collect, eachslice(inp, dims=1)) to make an array of copies. These will be slower – about the same as the comprehensions above, making the same things. But, at least for the array of views, this is still likely to still be dwarfed by whatever you do using them, so counting the nanoseconds here is probably not so important.

julia> inp = rand(100,100,100);

julia> @btime out = [($inp[i,:,:]) for i in axes($inp)[1]];
  992.833 μs (201 allocations: 7.63 MiB)  # memory comparable to a copy

julia> @btime collect.(eachslice($inp, dims=1));
  979.208 μs (207 allocations: 7.64 MiB)

julia> @btime copy($inp);
  297.959 μs (2 allocations: 7.63 MiB)

julia> @btime out = [view($inp, i,:,:) for i in axes($inp)[1]];
  429.712 ns (1 allocation: 4.81 KiB)   # very cheap

julia> @btime collect(eachslice($inp, dims=1));
  597.146 ns (4 allocations: 4.88 KiB)
2 Likes

Well, this brings me to explain my whole problem which is broader than the initial topic of this thread:

I have a black-box library that generates vectors of integers. For simplicity, let’s say my input is the following vector inp, where I replaced the actual values of inp by ordered numbers so that we can see where each one goes in the desired output:

julia> w = h = 3
3

julia> inp = collect(1:3*w*h)
27-element Array{Int64,1}:
  1
  2
  3
  ⋮
 25
 26
 27

Basically, this input is a flat version of a 3D array and here I am assuming that its size is (3, w, h). I cannot modify the way my input is provided.

Then, my output should be the following vector of arrays out:

julia> out = f1(inp)
3-element Array{Array{Int64,2},1}:
 [1 10 19; 4 13 22; 7 16 25]
 [2 11 20; 5 14 23; 8 17 26]
 [3 12 21; 6 15 24; 9 18 27]

It has been generated with the function f1 below:

julia> function f1(x::Array{Int64,1})
           w = h = 3
           reshaped = reshape(x, (3,w,h))
           [Array{Int64}(reshaped[i,:,:]) for i in 1:3]
       end
f1 (generic function with 1 method)

Given this, I would like to build a function that does:

  • the same computation as f1,
  • faster,
  • and, whose output is of type Array{Array{Int64,2},1}.

So far, and because of the conversion to type Array{Array{Int64,2},1}, I could only gain some speed-up with the following function f2:

julia> function f2(x::Array{Int64,1})
           w = h = 3
           [convert(Array{UInt8,2}, reshape(@view(x[i:3:length(x)]), (w,h))) for i in 1:3]
       end
f2 (generic function with 1 method)

julia> @btime f1(inp)
  389.115 ns (8 allocations: 1.13 KiB)

julia> @btime f2(inp)
  254.549 ns (4 allocations: 592 bytes)

Using TensorCast:

w = h = 3
inp = collect(1:3*w*h)
using TensorCast
@cast out3[i][j,k] := inp[i⊗j⊗k] (i∈1:3, j∈1:3, k∈1:3)

Result is:

 [1 10 19; 4 13 22; 7 16 25]
 [2 11 20; 5 14 23; 8 17 26]
 [3 12 21; 6 15 24; 9 18 27]

and timing is (@btime syntax thanks to @mcabbott):

@btime @cast out3[i][j,k] := $inp[i⊗j⊗k] (i∈1:3, j∈1:3, k∈1:3)
120 ns (7 allocations: 368 bytes)

OK, the question I have then is: Why must it be ordered like this?

I’m just wondering if that is because it’s how you would do it in Python, where the last dimension is fastest, or whether there is some deeper reason which is intrinsic to the problem itself?

1 Like

Thanks for the idea! However, I should convert my output to type Array{Array{Int64,2},1}. Doing so with Base.convert makes it too slow but there might be another solution.

I would say it is intrinsic to the problem: I am handling RGB images which I wish to represent as vectors of 3 elements (one per channel) that are 2D arrays. The black-box generator provides me with the flat version of the complete RGB image and the output out = f1(inp) as represented earlier corresponds exactly to each RGB channel stacked into a vector.

Have you looked into the Julia Images ecosystem? There is an RGB type which gives an extremely convenient and efficient representation of images, and where the RGB-values are stored next to each other.

I haven’t worked with it much, but when I have tried it, I have found it to be super useful, concise, and fast.

6 Likes

Using |= for assignment, which creates a copy (or use lazy=false), seems to produce the required types? The timing is shown below:

@btime @cast out3[i][j,k] |= $inp[i⊗j⊗k] (i∈1:3, j∈1:3, k∈1:3)

  223.0 ns (10 allocations: 624 bytes)
3-element Vector{Matrix{Int64}}:
 [1 10 19; 4 13 22; 7 16 25]
 [2 11 20; 5 14 23; 8 17 26]
 [3 12 21; 6 15 24; 9 18 27]