I’m trying to understand memory usage and efficient coding of functions in Julia. Based on:
using LinearAlgebra
using BenchmarkTools
consider the following 3 illustrative functions [illustrating ideas; not the ones I will use]:
Function 1
function func1()
a = rand(100,1000)
return norm(a,2)
end
;
@benchmark begin
for i in 1:10000
func1()
end
end
#
BenchmarkTools.Trial:
memory estimate: 7.45 GiB
allocs estimate: 20000
--------------
minimum time: 4.654 s (15.18% GC)
median time: 4.662 s (15.23% GC)
mean time: 4.662 s (15.23% GC)
maximum time: 4.669 s (15.29% GC)
--------------
samples: 2
evals/sample: 1
Comment 1: I assume that func1()
allocates a new 10^2\times10^3 matrix in heap memory each of the 10^4 times the function is called. With 8 bytes pr. element, this is 7.45 GiB.
Function 2:
function func2(a)
a .= rand(100,1000)
return norm(a,2)
end
;
a = rand(100,1000)
@benchmark begin
for i in 1:10000
func2(a)
end
end
#
BenchmarkTools.Trial:
memory estimate: 7.45 GiB
allocs estimate: 30000
--------------
minimum time: 5.944 s (11.68% GC)
median time: 5.944 s (11.68% GC)
mean time: 5.944 s (11.68% GC)
maximum time: 5.944 s (11.68% GC)
--------------
samples: 1
evals/sample: 1
Comment 2: I’m a little surprised here… The input argument a
for func2()
is put in stack memory and is thus efficiently deleted every time the function is exited, so I thought the memory allocation would be small. However, I guess the reason is that memory is temporarily allocated in heap memory when calling rand(100,1000)
, leading to no gain. In fact, it is worse.
It turns out that if I replace a .= rand(100,1000)
with a = rand(100,1000)
, func2()
behaves just like func1()
. It is not quite clear to me why a .=...
is worse than a = ...
here.
Function 3:
function func3(a,m,n)
for i in 1:n
for j in 1:m
a[j,i] = rand()
end
end
return norm(a,2)
end
;
a = rand(100,1000)
m,n = size(a)
@benchmark begin
for i in 1:10000
func3(a,m,n)
end
end
#
BenchmarkTools.Trial:
memory estimate: 156.25 KiB
allocs estimate: 10000
--------------
minimum time: 4.737 s (0.00% GC)
median time: 4.816 s (0.00% GC)
mean time: 4.816 s (0.00% GC)
maximum time: 4.896 s (0.00% GC)
--------------
samples: 2
evals/sample: 1
Comment 3: This is, by far, the most efficient code when it comes to memory usage. The input arguments a
, m
, n
are placed in stack memory, and are efficiently deleted every time the function is exited. The loops ensure that only scalars are involved, and scalars are also put in stack memory and efficiently deleted, I assume.
QUESTIONS to ye Julia gurus:
- Why is
a .= ...
worse thana = ...
in this particular case (i.e.func2()
)? - Why is even
156.25 KiB
allocated forfunc3(...)
? Is memory allocated for callingnorm(a,2)
?? - Is there an even more efficient way to do this?
- Since I modify the input argument for
func2()
andfunc3()
, should I really have named themfunc2!()
andfunc3!()
? - The reason I look into this, is that I’ve coded a
loss()
function for doing parameter estimation of an ODE model. On my workstation with 32 GB RAM, doing estimation allocates some74 GiB
memory – and the estimation takes close to 8 minutes. I see lots of potential for reducing the memory allocation. Still…with@benchmark
estimating74 GiB
in a computer with 32 GB RAM… does that imply:
a. The computer starts to swap memory to the SSD disk, which really slows down things [would be horrible on a hard disk]?, or
b. The74GiB
is the cumulative allocation, and the Garbage Collection algorithm frees memory so that the code still runs in RAM?
Good advice and knowledgeable answers are highly appreciated!