In the following code snippet, I use a symmetric matrix to calculate another. If I assign the matrix as Symmetric, it leads to excessive allocations if the loop is multi-threaded.
using BenchmarkTools
function foo()
ns = 1500
ne = 200
rbar = rand(Float64, (ne,ne)) * 30
#rbar = Symmetric(rbar, :L) # comment this line to see the difference
potzl = rand(ComplexF64, (ne,ne))
zls = [Array{ComplexF64}(undef, (ne,ne)) for t = 1:Threads.nthreads()]
Threads.@threads for f = 1:ns
t = Threads.threadid()
zl = zls[t]
g = rand(ComplexF64)
for k = 1:ne
for i = k:ne
exp_gr = exp(-g * rbar[i,k]) # FIXME exp very slow
zl[i,k] = exp_gr * potzl[i,k]
end
end
end
end
precompile(foo, ())
@btime foo()
The output of @btime
without commenting rbar = Symmetric(rbar, :L)
is 3.168 s (180900039 allocations: 4.95 GiB
. If I comment that line, the output of @btime
is 287.963 ms (36 allocations: 3.67 MiB)
. Why does that happen?