Hi Guys,
I use Julia for computation intensive tasks. And of coz the best way is to reduce my run time is to use multithreading and parallelize all my work.
Sometimes, my works will require larger per thread memory and hence I wrote a util to control the number of threads that I will be using dynamically. And it looks like this. (I am aware that v1.5 got some new function of launching tasks to multithreads without oversubscribe, but I still love control of number of threads dynamically for not over booking the memory).
function my_map(
func, to_process, params...;
num_thread=Threads.nthreads() - 1,
outType::Type=Any,
hasRet::Bool=true
)
pos = Threads.Atomic{Int64}(1);
# prepare the output array
out_array = Vector{outType}(undef, length(to_process))
@threads for i = 1:num_thread
while(true)
this_thread_pos = atomic_add!(pos, 1)
if this_thread_pos <= length(to_process)
if hasRet
out_array[this_thread_pos] =
func(to_process[this_thread_pos], params...)
else
func(to_process[this_thread_pos], params...)
end
else
break
end
GC.safepoint()
end #while true loop
end
# return result
if hasRet
return out_array
else
return
end
end
From here, you can see that each of the threads is reused multiple times. And this is where the problem emerge.
The thing is it looks like that there are memory fragmentation problem, although I tried to add GC.safepoint() in my util function as well as my actual computation functions. The memory usage seems to grow overtime and eventually lead to memory error. I am using Centos 7.6 with default kernel setting. And it looks like to me that the growing memory problem seems to arise from memory fragmentation with many allocation and de-allocation within my computation function (I am using some 3rd library as well so it is not avoidable).
I got this kind of error in kernel
[2892174.756628] Out of memory: Kill process 77480 (julia) score 600 or sacrifice child
[2892174.756631] Killed process 77480 (julia) total-vm:149233800kB, anon-rss:78731484kB, file-rss:4kB, shmem-rss:1756kB
[2892174.851900] julia: page allocation failure: order:0, mode:0x280da
[2892174.851905] CPU: 5 PID: 77480 Comm: julia Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.21.3.el7.x86_64 #1
[2892174.851907] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS MASTER/X299 AORUS MASTER, BIOS F2 11/05/2018
[2892174.851909] Call Trace:
[2892174.851915] [<ffffffff88f63107>] dump_stack+0x19/0x1b
I have even tried to change the MMAP thresholds but that doesn’t help.
I have tried to do similar stuff with Distributed, where everytime I use addprocs() to add a process, and use rmprocs() to remove the process once the task is finished. In that case, since the work is forked and the memory will become clean after each function is complete, I don’t get the memory error. That proved that my machine should have enough memory for all this work. It just that the garbage collector do not clean up all the memory over time.
I am not an expert of garbage collector, but I think Julia GC do not copy nor move the object. It just mark the object that is no longer in use and release them. If that is allocated thru MMAP I think it will always be released by the system. If it is not allocated thru MMAP then our only hope is it is at the head of the heap such that it can be trimmed. (not sure if my understanding is right).
I guess I can try to make a MRE to illustrate the problem with random Vector allocation and deallocation with difference sizes.
I think Julia is a wondering language but if we want it to make significant contribution to the scientific world. This kind of problem has to be tackled. And I would like to contribute to solve this kind of problem.
Any thoughts, suggestions?