@threads vs @parallel, a simple fail case for @threads

fkjogu · October 31, 2017, 1:32pm

I’m new to Julia. I was experimenting with simple parallel code using threads and processes. Why is

a = SharedArray{Int64,1}(4)
@time @sync @parallel for i = 1:60000000
    a[myid() - 1] = i
end

#   1.663024 seconds (70.65 k allocations: 3.814 MiB)

so much faster and efficient than

a = SharedArray{Int64,1}(4) # also using a SharedArray for fairness
@time Threads.@threads for i = 1:60000000
    a[Threads.threadid()] = i
end

#   7.977596 seconds (142.59 M allocations: 2.483 GiB, 2.39% gc time)

? It seems like the latter allocates a lot. Where does this come from?

I’ve started the REPL with env JULIA_NUM_THREADS=4 julia -p 4.

aaowens · October 31, 2017, 1:50pm

First, you need to wrap your code in a function before benchmarking is meaningful. See https://docs.julialang.org/en/stable/manual/performance-tips.html .

I think maybe the @time macro is interfering with @threads in some way? I initially ran the following copied from your code.

function test2()
       a = SharedArray{Int64,1}(4) # also using a SharedArray for fairness
       @time Threads.@threads for i = 1:60000000
           a[Threads.threadid()] = i
       end
 end

This is really slow, and @code_warntype complains about a Core.Box variable. However, taking the @time out of the function fixes this. The following works:

function test1()
       a = SharedArray{Int64,1}(4)
       @sync @parallel for i = 1:60000000
           a[myid() - 1] = i
       end
 end

function test2()
       a = SharedArray{Int64,1}(4) # also using a SharedArray for fairness
       Threads.@threads for i = 1:60000000
           a[Threads.threadid()] = i
       end
       end

test1() # Warmup 
test2()

The output is

julia> @time test1()
  0.018457 seconds (1.25 k allocations: 48.188 KiB)
4-element Array{Future,1}:
 Future(2, 1, 111, #NULL)
 Future(3, 1, 112, #NULL)
 Future(4, 1, 113, #NULL)
 Future(5, 1, 114, #NULL)

julia> @time test2()
  0.011668 seconds (501 allocations: 18.484 KiB)

fkjogu · October 31, 2017, 4:31pm

Thank you! That might give a lead, why the allocations.

As it is often the case with such things, I realized my mistake with not wrapping it into a function after posting. But the post was pending for moderation so I couldn’t change it.

But what I wonder, is there a way to precompile the function without running it? Do I need to wrap it into a module and add __precompile__() at the top?

rdeits · October 31, 2017, 6:20pm

Rather than trying to precompile by hand, you can just use GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language to run your function repeatedly and provide a robust estimate of its actual runtime, ignoring compilation.

Topic		Replies	Views
Question for lower performance by using @threads in for loop New to Julia question	13	1089	July 9, 2021
Problem on benchmarking multi-thread code Performance	3	401	February 10, 2021
Threads memory allocations General Usage	2	507	January 22, 2020
Slower execution with multi-threading using @threads macro Performance question , parallel , multithreading	5	767	August 13, 2020
Parallel code seems slow Performance	3	1441	October 20, 2017

@threads vs @parallel, a simple fail case for @threads

Related topics