I met a weird problem in the following trivial code:
using BenchmarkTools
const p=150
dd = randn(p,p)
function d2(ψ::Matrix{Float64})::Matrix{Float64}
ψr = similar(ψ)
ddψ = dd*ψ
@time @inbounds for i in eachindex(ddψ,ψr)
ψr[i]=ddψ[i]
end
return ψr
end
ψ=randn(p,p*p)
@btime d2(ψ);
The result is:
0.394772 seconds (13.50 M allocations: 257.476 MiB, 4.49% gc time)
0.396507 seconds (13.50 M allocations: 257.476 MiB, 5.61% gc time)
0.397738 seconds (13.50 M allocations: 257.476 MiB, 3.80% gc time)
0.396423 seconds (13.50 M allocations: 257.476 MiB, 4.15% gc time)
0.430600 seconds (13.50 M allocations: 257.476 MiB, 7.25% gc time)
0.410795 seconds (13.50 M allocations: 257.476 MiB, 6.39% gc time)
0.586821 seconds (13.50 M allocations: 257.476 MiB, 26.65% gc time)
0.486760 seconds (13.50 M allocations: 257.476 MiB, 20.09% gc time)
0.380750 seconds (13.50 M allocations: 257.476 MiB, 3.97% gc time)
0.404442 seconds (13.50 M allocations: 257.476 MiB, 5.99% gc time)
0.402110 seconds (13.50 M allocations: 257.476 MiB, 5.73% gc time)
0.411570 seconds (13.50 M allocations: 257.476 MiB, 3.88% gc time)
0.406931 seconds (13.50 M allocations: 257.476 MiB, 6.19% gc time)
0.404919 seconds (13.50 M allocations: 257.476 MiB, 6.29% gc time)
0.419291 seconds (13.50 M allocations: 257.476 MiB, 7.73% gc time)
0.442829 seconds (13.50 M allocations: 257.476 MiB, 9.08% gc time)
0.456120 seconds (13.50 M allocations: 257.476 MiB, 7.63% gc time)
0.395785 seconds (13.50 M allocations: 257.476 MiB, 4.61% gc time)
0.419890 seconds (13.50 M allocations: 257.476 MiB, 3.97% gc time)
0.405614 seconds (13.50 M allocations: 257.476 MiB, 6.46% gc time)
0.383479 seconds (13.50 M allocations: 257.476 MiB, 5.23% gc time)
0.412925 seconds (13.50 M allocations: 257.476 MiB, 6.41% gc time)
0.410666 seconds (13.50 M allocations: 257.476 MiB, 6.52% gc time)
0.404238 seconds (13.50 M allocations: 257.476 MiB, 7.23% gc time)
0.420667 seconds (13.50 M allocations: 257.476 MiB, 6.23% gc time)
0.420667 seconds (13.50 M allocations: 257.476 MiB, 6.23% gc time)
0.445725 seconds (13.50 M allocations: 257.476 MiB, 3.58% gc time)
0.405995 seconds (13.50 M allocations: 257.476 MiB, 6.12% gc time)
0.413399 seconds (13.50 M allocations: 257.476 MiB, 7.18% gc time)
415.830 ms (13499133 allocations: 308.99 MiB)
So, it’s incredibly slow and the memory allocation is unreasonably large. Why is that? The for loop in the function is expected to have no memory allocation at all.