I have a project where julia’s memory usage is at the moment too high, and we can not use it. I have attempted to replicate the behaviour in a minimal working example. The example is oversimplified, in that you could rewrite it to work entirely in-place. This is not possible for the actual code.
using Statistics, Base.Threads
ms = [randn(ComplexF64,100*2,100*2) for i in 1:20000]
println("this should only take on the order $((Base.summarysize(ms[1])*3*nthreads()+Base.summarysize(ms))/(1024*1024*1024)) GB")
function _reduce(blocks,basesize)
if length(blocks)<basesize
toret = zero(blocks[1]);
for a in blocks
t1 = randn()*a;
t2 = randn()*t1;
t3 = randn()*t2;
t4 = randn()*t3;
t5 = randn()*t4;
t6 = randn()*t5;
t7 = randn()*t6;
t8 = randn()*t7;
t9 = randn()*t8;
t9 .-=mean(t9)
toret += t9
end
return toret
else
split = Int(ceil(length(blocks)/2));
t = @Threads.spawn _reduce(blocks[1:split],basesize)
toret = _reduce(blocks[split+1:end],basesize);
toret+=fetch(t)
return toret
end
end
@show mean(_reduce(ms,length(ms)/nthreads()))
println("done")
readline()
Notice how I allocate a bunch of temporaries, but that they can almost directly be freed up again. I start by printing out the amount of memory that I expect to need (13 GB), but when I fire it up with 12 threads on a 32Gb ram laptop, it gets OOM killed.
Even worse, the heap-size-hint gets completely ignored, and when I try to investigate heap snapshots from my actual code (so not the minimal example) then I see that these only describe a heap of size 1.5 GB, despite coming from a process taking up 20 times as much.