I’m new to Julia. I was experimenting with simple parallel code using threads and processes. Why is
a = SharedArray{Int64,1}(4)
@time @sync @parallel for i = 1:60000000
a[myid() - 1] = i
end
# 1.663024 seconds (70.65 k allocations: 3.814 MiB)
so much faster and efficient than
a = SharedArray{Int64,1}(4) # also using a SharedArray for fairness
@time Threads.@threads for i = 1:60000000
a[Threads.threadid()] = i
end
# 7.977596 seconds (142.59 M allocations: 2.483 GiB, 2.39% gc time)
? It seems like the latter allocates a lot. Where does this come from?
I’ve started the REPL with env JULIA_NUM_THREADS=4 julia -p 4.
I think maybe the @time macro is interfering with @threads in some way? I initially ran the following copied from your code.
function test2()
a = SharedArray{Int64,1}(4) # also using a SharedArray for fairness
@time Threads.@threads for i = 1:60000000
a[Threads.threadid()] = i
end
end
This is really slow, and @code_warntype complains about a Core.Box variable. However, taking the @time out of the function fixes this. The following works:
function test1()
a = SharedArray{Int64,1}(4)
@sync @parallel for i = 1:60000000
a[myid() - 1] = i
end
end
function test2()
a = SharedArray{Int64,1}(4) # also using a SharedArray for fairness
Threads.@threads for i = 1:60000000
a[Threads.threadid()] = i
end
end
test1() # Warmup
test2()
Thank you! That might give a lead, why the allocations.
As it is often the case with such things, I realized my mistake with not wrapping it into a function after posting. But the post was pending for moderation so I couldn’t change it.
But what I wonder, is there a way to precompile the function without running it? Do I need to wrap it into a module and add __precompile__() at the top?