There is certainly a performance hit using CartesianIndex in 2D. In 3D, I don’t see the same slowdown. Maybe this is because looping through high dimensional arrays has a penalty similar to using CartesianIndex… Maybe because @simd can be used for the single CartesianIndex loop, but only one of the 3D loops?
julia> @btime WaterLily.restrict!(a,b) setup=((a,b)=(zeros(34,34),rand(66,66)))
5.150 μs (0 allocations: 0 bytes)
julia> @btime res2D!(a,b) setup=((a,b)=(zeros(34,34),rand(66,66)))
698.604 ns (0 allocations: 0 bytes)
julia> @btime res3D!(a,b) setup=((a,b)=(zeros(34,34,1),rand(66,66,1)))
8.567 μs (0 allocations: 0 bytes)
julia> @btime WaterLily.restrict!(a,b) setup=((a,b)=(zeros(34,34,34),rand(66,66,66)))
290.300 μs (0 allocations: 0 bytes)
julia> @btime res3D!(a,b) setup=((a,b)=(zeros(34,34,34),rand(66,66,66)))
339.300 μs (0 allocations: 0 bytes)
res2D!() is your code, and res3D!() is below. It’s requires the arrays be 3D, but if the size of the last dimension is 1, it does the 2D restriction. However, it’s slow in both 2D and 3D!
Summary
inside2(n) = min(2,n):max(1,n-1)
near2(i) = 2i-2:2i-1
near2(i,n) = ifelse(n==1,1:1,near2(i))
@fastmath function res3D!(a,b)
(ni,nj,nk) = size(a)
@inbounds for i ∈ inside2(ni), j ∈ inside2(nj), k ∈ inside2(nk)
σ = 0.
@inbounds for I ∈ near2(i), J ∈ near2(j), K ∈ near2(k,nk)
σ+= b[I,J,K]
end
a[i,j,k] = 0.
end
end