Two or more @threads for blocks in the same function


#1

First off, I am very impressed with multi-threading so far, although I am mainly treating @threads for as basic OpenMP parallel regions. This question is more related to my having to create extra small functions to avoid a particular allocation issue when two multi-threaded regions are in the same function. That is, I find that in order to avoid a substantial amount of additional allocation I am creating extra functions to house additional multi-threaded regions.

Is whatever is causing this need likely to remain in the future? The current disadvantage is that functions are being created that are not very sensible, relying in my case on various intermediate quantities having been computed and stored by the previous function and thereby making code less easy to understand.

It’s not obvious in the example below, but the reason for needing multiple blocks in general is that I need to do some non-parallel computations in between the parallel regions. The purpose of the first function, where there is only one loop, is only for comparison.

const array = Vector{Int64}(1:1024)

function oneThreadLoop()
  Threads.@threads for i = 1:length(array)
    array[i] += 1
    array[i] -= 1
  end
end

function twoThreadLoops()
  Threads.@threads for i = 1:length(array)
    array[i] += 1
  end
  Threads.@threads for i = 1:length(array)
    array[i] -= 1
  end
end

@inline function loop1()
  Threads.@threads for i = 1:length(array)
    array[i] += 1
  end
end

@inline function loop2()
  Threads.@threads for i = 1:length(array)
    array[i] -= 1
  end
end

function twoSeparateThreadLoops()
  loop1()
  loop2()
end

Benchmarking gives:

@btime oneThreadLoop()
981.600 ns (1 allocation: 32 bytes)
@btime twoThreadLoops()
143.812 μs (2372 allocations: 12.67 KiB)
@btime twoSeparateThreadLoops()
  2.261 μs (2 allocations: 64 bytes)

#2

You are having type instability on twoThreadLoops because of this issue. While this issue is not fixed the best thing to do is to have your threaded loops on separate functions, like you are doing in twoSeparateThreadLoops.