Weird allocation from SharedArrays/StaticArrays

Hi, I’m getting an allocation from the following code, which results in a factor 20x slowdown in this MWE:



using SharedArrays,BenchmarkTools,StaticArrays

U = SharedArray{Float64}(zeros(100,100,100))
function mytestfun(U)
    function getIndices(i,j,k)
        #return U[i,j,k]
        return U[mod(i,3)+1,j,k]
    end
    uvals = @SArray [getIndices(i,j,k) for i in 1:4, j in 1:4, k in 1:4]
    return uvals[1]
end

@btime mytestfun($U)

Compare the code above to the same code but with line 7 uncommented, which results in it having zero allocations. Does anyone know what is going on?

I don’t have an explanation, but I have a solution:

julia> function mytestfun_faster(U)
           uvals = @SArray [U[i, j, k] for i in (mod.(1:4, 3) .+ 1), j in 1:4, k in 1:4]
           return uvals[1]
       end

julia> @btime mytestfun_faster($U)
  17.230 ns (0 allocations: 0 bytes)

I don’t know enough to speculate as to why this is the case, but maybe it can help fill in the puzzle.

Weird. What also seems to work is doing U[(i-1) + (j-1)*nx*ny + (k-1)*nx*ny +1], i.e. converting i,j,k to a suitable linear index.