A MWE, in REPL, Julia 1.9.4:
julia> using BenchmarkTools
julia> function myset!(y, x, n, ind)
@views for i = 1 : n
x[ind[i], :, :] .= y[:, :, i]
end
end
myset! (generic function with 1 method)
julia> cachex = zeros(ComplexF64, 96, 48, 120); cachey = zeros(ComplexF64, 48, 120, 49);
julia> cachex2 = zeros(ComplexF64, 96, 48, 15); cachey2 = zeros(ComplexF64, 48, 15, 49);
julia> ind = [i+20 for i = 1 : 49];
julia> @btime myset!($cachey, $cachex, 49, $ind)
1.162 ms (0 allocations: 0 bytes)
julia> @btime myset!($cachey2, $cachex2, 49, $ind)
74.148 ΞΌs (0 allocations: 0 bytes)
The array size increased by 8*, while the time increased by 15.7*. Whatβs the reason?
I believe this has the similar cause with my previous problem, in that the increase in computation cost results in a more-than-linear increase in computation time. The current problem, however, is much simpler, and has nothing to do with BLAS threads, function inlining, and so on. It also happens even if the CPU usage in the node is very low.