Why so large memory allocations?

I met a weird problem in the following trivial code:

using BenchmarkTools

const p=150
dd = randn(p,p)

function d2(ψ::Matrix{Float64})::Matrix{Float64}
    ψr = similar(ψ)
    ddψ = dd*ψ
    @time @inbounds for i in eachindex(ddψ,ψr)
        ψr[i]=ddψ[i]
    end
    return ψr
end

ψ=randn(p,p*p)
@btime d2(ψ);

The result is:

  0.394772 seconds (13.50 M allocations: 257.476 MiB, 4.49% gc time)
  0.396507 seconds (13.50 M allocations: 257.476 MiB, 5.61% gc time)
  0.397738 seconds (13.50 M allocations: 257.476 MiB, 3.80% gc time)
  0.396423 seconds (13.50 M allocations: 257.476 MiB, 4.15% gc time)
  0.430600 seconds (13.50 M allocations: 257.476 MiB, 7.25% gc time)
  0.410795 seconds (13.50 M allocations: 257.476 MiB, 6.39% gc time)
  0.586821 seconds (13.50 M allocations: 257.476 MiB, 26.65% gc time)
  0.486760 seconds (13.50 M allocations: 257.476 MiB, 20.09% gc time)
  0.380750 seconds (13.50 M allocations: 257.476 MiB, 3.97% gc time)
  0.404442 seconds (13.50 M allocations: 257.476 MiB, 5.99% gc time)
  0.402110 seconds (13.50 M allocations: 257.476 MiB, 5.73% gc time)
  0.411570 seconds (13.50 M allocations: 257.476 MiB, 3.88% gc time)
  0.406931 seconds (13.50 M allocations: 257.476 MiB, 6.19% gc time)
  0.404919 seconds (13.50 M allocations: 257.476 MiB, 6.29% gc time)
  0.419291 seconds (13.50 M allocations: 257.476 MiB, 7.73% gc time)
  0.442829 seconds (13.50 M allocations: 257.476 MiB, 9.08% gc time)
  0.456120 seconds (13.50 M allocations: 257.476 MiB, 7.63% gc time)
  0.395785 seconds (13.50 M allocations: 257.476 MiB, 4.61% gc time)
  0.419890 seconds (13.50 M allocations: 257.476 MiB, 3.97% gc time)
  0.405614 seconds (13.50 M allocations: 257.476 MiB, 6.46% gc time)
  0.383479 seconds (13.50 M allocations: 257.476 MiB, 5.23% gc time)
  0.412925 seconds (13.50 M allocations: 257.476 MiB, 6.41% gc time)
  0.410666 seconds (13.50 M allocations: 257.476 MiB, 6.52% gc time)
  0.404238 seconds (13.50 M allocations: 257.476 MiB, 7.23% gc time)
  0.420667 seconds (13.50 M allocations: 257.476 MiB, 6.23% gc time)
  0.420667 seconds (13.50 M allocations: 257.476 MiB, 6.23% gc time)
  0.445725 seconds (13.50 M allocations: 257.476 MiB, 3.58% gc time)
  0.405995 seconds (13.50 M allocations: 257.476 MiB, 6.12% gc time)
  0.413399 seconds (13.50 M allocations: 257.476 MiB, 7.18% gc time)
  415.830 ms (13499133 allocations: 308.99 MiB)

So, it’s incredibly slow and the memory allocation is unreasonably large. Why is that? The for loop in the function is expected to have no memory allocation at all.

I see a 30x gain by just making dd a const.

3 Likes

Was going to post something very similar - an alternative to const is to have dd as a function argument (it’s generally not a good idea to have functions depend on global variables):

julia> function d2(ψ, dd)
           ψr = similar(ψ)
           ddψ = dd*ψ
           @inbounds for i in eachindex(ddψ,ψr)
               ψr[i] = ddψ[i]
           end
           ψr
       end
d2 (generic function with 3 methods)

julia> @btime d2($ψ, $dd);
  29.738 ms (4 allocations: 51.50 MiB)
6 Likes

See: Performance Tips · The Julia Language

2 Likes

Yes, you are right, 30x gain and no memory allocation by just making dd a const. Just curious, why the heavy consumption occurs in the for loop instead of the step “ddψ = dd*ψ”? The for loop seems to have no direct relation to dd.

In Julia, type specialization (the secret sauce that makes it faster than Python but still dynamic) occurs at function calls. Nothing at compilation time can be inferred about the type of ddψ since it depends on a non-constant global. The for-loop can’t regain knowledge of the type, since it lives in the same function as when disaster struck.

Things would be different if the for-loop was in its own function (see this part of the performance tips).

3 Likes

Thanks!