Hi!
I have a few questions on parallel programming and multi-threading. If I understand correctly, the Threads.@threads
feature is modeled after OpenMP’s and Cilk’s multi-threading schemes, where one and only one thread is running on one logical processor, but somehow the heap memory is shared between all processors/processes/threads. The stack on the other hand is split between the threads. This seems verifiable from the following code which allocates constant memory regardless of the input vector size, and the input vector is a normal array not a SharedArray
.
using BenchmarkTools
function g(a, n)
@inbounds Threads.@threads for i in 1:n
a[i] = i*rand() + rand()^2 * (i - 1)^2 - 2;
end
return
end
n = 100000; a = zeros(Float64, n); @btime g(a, n)
# 1.573 ms (1 allocation: 32 bytes)
The above code is run with Threads.nthreads()
equal 8.
- Can someone please verify the above understanding? If my understanding is correct, then every stack-allocated variable read from a threaded block will be copied over to each thread’s stack when creating new threads, which is useful to keep in mind.
In contrast, @parallel
requires an array to be a SharedArray
to be visible to all logical processors/processes/threads or it needs to be sent over, so the heap and stack memories are not shared by default. The following shows a comparison of the running times and allocation using Threads.@threads
and @parallel
, which shows a clear advantage to the @parallel
construct even though @parallel
uses only 7 workers for 8 processes while Threads.@threads
uses all the 8 threads AFAICT (correct me if I am wrong). On the other hand, @parallel
has a larger overhead in both time and allocations which reverse the ranking when testing with simpler functions.
- So can someone explain the difference below, when it would be favorable to use
Threads.@threads
over@parallel
, and if the following benchmarks are expected to change in v1.0? I currently don’t have access to nightly or Linux so this is on v0.6.1 on a Windows machine.
function f(a, n)
@inbounds @parallel for i in 1:n
a[i] = i*rand() + rand()^2 * (i - 1)^2 - 2;
end
return
end
function g(a, n)
@inbounds Threads.@threads for i in 1:n
a[i] = i*rand() + rand()^2 * (i - 1)^2 - 2;
end
return
end
n = 100000; a = SharedArray{Float64}(n); @btime f(a, n)
# 399.680 μs (942 allocations: 38.25 KiB)
n = 100000; a = zeros(Float64, n); @btime g(a, n)
# 1.661 ms (1 allocation: 32 bytes)
-
Does the Julia backend of GPUArrays.jl use
Threads.@threads
or@parallel
constructs? -
Is it possible to use
@simd
with multi-threading? It gives an error anywhere I try it. -
Don’t you think it would be nice to have an inplace
pmap!
likemap!
?
Thanks a lot in advance!