I have large three-dimensional arrays A
that have different values, but from which I always wish to select the same set of entries. So A
is a different matrix each time, but it has the same size and I need to collect the same entries: see the example code below. The first dimension never plays a role in which elements are selected. I want this to go fast.
The experiment below compares several methods. On my ageing AMD machine at home, Tullio
wins hands down, whereas on my newer Intel machine at work I get the following times:
base: 1.629 ms (0 allocations: 0 bytes)
folds: 783.790 μs (91 allocations: 8.94 KiB)
mask: 4.942 ms (0 allocations: 0 bytes)
slices: 1.240 ms (0 allocations: 0 bytes)
loopy: 439.534 μs (82 allocations: 8.44 KiB)
loopier: 653.814 μs (3082 allocations: 7.70 MiB)
tullio: 1.062 ms (41 allocations: 2.23 KiB)
Three questions:
- What better alternatives exist?
- Why is
mask
so slow? - What explains the discrepancy between home and office machines? [true for any number of cores]
Thanks!!!
using BenchmarkTools, LoopVectorization, Tullio, Base.Threads, Folds, Referenceables
function f( ::Val{:base}, R, A, b, X )
for i ∈ axes( A, 2 )
for r ∈ axes( A,1 )
R[r,i] = A[r,i,b[i]]
end
end
return nothing
end
function f( ::Val{:loopy}, R, A, b, X )
@threads for i ∈ axes( A, 2 )
for r ∈ axes( A,1 )
R[r,i] = A[r,i,b[i]]
end
end
return nothing
end
function f( ::Val{:loopier}, R, A, b, X )
@threads for i ∈ axes( A, 2 )
R[:,i] .= A[:,i,b[i]]
end
return nothing
end
function f( ::Val{:tullio}, R, A, b, X )
@tullio fastmath=false R[r,i] = A[r,i,b[i]]
return nothing
end
function f( ::Val{:mask}, R, A, b, X )
# size( A[X] ) )
@views R[:] = A[X]
return nothing
end
function f( ::Val{:slices}, R, A, b, X )
i = 0
for (x,y) ∈ zip( eachslice( A; dims=2 ), eachslice( R; dims = 2 ) )
i += 1
bᵢ = b[i]
for r ∈ axes(x,1)
y[r] = x[r,bᵢ]
end
end
return nothing
end
function f( ::Val{:folds}, R, A, b, X )
Folds.foreach( referenceable( vec(R) ), X ) do y, x
y[] = A[x]
end
end
const d₁ = 1_000
const d₂ = 1_000
const d₃ = 100
for meth ∈ [ :base, :folds, :mask, :slices, :loopy, :loopier, :tullio, ]
print( "$(meth): ")
@btime f( Val($meth), R, A, b, X ) setup=( R=zeros( d₁, d₂); A = randn( d₁,d₂,d₃); b = rand(1:d₃,d₂); X = findall([ (b[i]==j) for r ∈ 1:d₁, i ∈ 1:d₂, j∈ 1:d₃]) )
end
''''