Hello all,
I have a some code that uses Cartesian Indices that slows down heavily when views are involved. The slowdown is dimension dependent, I was able to figure out that it is because in some cases things inline/unroll correctly and for higher dimensions they don’t. Any hints on why this is happening or how to resolve it would be great!
MWE:
using BenchmarkTools
BenchmarkTools.DEFAULT_PARAMETERS.seconds=1.
function foo2(A)
n1,n2=size(A)
s=0.
for i2 in 2:n2-1
@simd for i1 in 2:n1-1
@inbounds s+=A[i1+1,i2]+A[i1-1,i2]+A[i1,i2-1]+A[i1,i2+1]
end
end
s
end
function foo3(A)
n1,n2,n3=size(A)
s=0.
for i3 in 2:n3-1
for i2 in 2:n2-1
@simd for i1 in 2:n1-1
@inbounds s+=A[i1+1,i2,i3]+A[i1-1,i2,i3]+A[i1,i2-1,i3]+A[i1,i2+1,i3]+
A[i1,i2,i3-1]+A[i1,i2,i3+1]
end
end
end
s
end
@inline I1dim(dim,ndims)=CartesianIndex(ntuple(i->ifelse(i==dim,1,0) ,Val{ndims}()))
function fooCI(A::AbstractArray{T,nd}) where {T,nd}
#nd=ndims(A)
in_ind=ntuple(i->2:size(A,i)-1,Val{nd}())
Rin=CartesianIndices(in_ind)
I1s=ntuple(i->I1dim(i,nd),Val{nd}())
s=zero(T)
f(A,I,I1s,i)=(@inbounds A[I+I1s[i]]+A[I-I1s[i]])
@simd for I in Rin
@inbounds s+=sum(ntuple(i->f(A,I,I1s,i),Val{nd}()))
end
s
end
A=rand(100,100)
vA=view(A,:,:)
@btime fooCI($A)
@btime fooCI($vA)
@btime foo2($A)
@btime foo2($vA)
1.974 μs (0 allocations: 0 bytes)
2.120 μs (0 allocations: 0 bytes)
1.991 μs (0 allocations: 0 bytes)
2.185 μs (0 allocations: 0 bytes)
A=rand(10,10,10)
vA=view(A,:,:,:)
@btime fooCI($A)
@btime fooCI($vA)
@btime foo3($A)
@btime foo3($vA)
353.519 ns (0 allocations: 0 bytes)
5.079 μs (0 allocations: 0 bytes)
321.382 ns (0 allocations: 0 bytes)
362.574 ns (0 allocations: 0 bytes)
As you see, the slowdown is in 3D but not in 2D. Also if you force f
to inline (@inline f
in fooCI
) the code starts to allocate and the slowdown is worse!
I appreciate any help,
Cheers!