Are certain operations just not good candidates for threading?

colintbowers · January 22, 2019, 3:46am

Hi all,

I was experimenting with using threading to speed up a loop over vcat that swaps the first and second half of a sequence of vectors. I fire up Julia and verify:

julia> Threads.nthreads()
4

I wrote the following simple test routine which can be cut-and-paste to the REPL:

function f1(N::Int, J::Int)
    x = [ collect(1:N) for j = 1:J ]
    nhalf = floor(Int, N/2)
    for j = 1:J
        x[j][:] = vcat(x[j][nhalf+1:end], x[j][1:nhalf])
    end
    return x
end
function f1_thread(N::Int, J::Int)
    x = [ collect(1:N) for j = 1:J ]
    nhalf = floor(Int, N/2)
    Threads.@threads for j = 1:J
        x[j][:] = vcat(x[j][nhalf+1:end], x[j][1:nhalf])
    end
    return x
end

x1 = f1(10, 100);
x2 = f1_thread(10, 100);

using BenchmarkTools
@btime f1(10000, 10000);
@btime f1_thread(10000, 10000);

The threaded version is about 10% faster.

Is the conclusion here that this just simply isn’t a good operation for threading? Is the problem that the overhead in setting up the threaded loop is greater than the savings made by threading? If so, are there any other options for parallelizing loops like this?

I also tried using @distributed, but the cost of applying @everywhere to x massively outweighted any gains.

I also tried SharedArray, and had a similar issue with the setup costs, and also kept getting errors when I tried larger examples of the form:

ERROR: On worker 3:
SystemError: shm_open() failed for /jl011485a0lWXYifpyIZJdgO1sIM: Too many open files

So, is the conclusion here that this just isn’t an operation that can be sped up by using multiple cores?

Cheers and thanks for all responders.

Colin

Ralph_Smith · January 22, 2019, 4:37am

Yes and no. (No, as the overhead is not troublesome unless your arrays are small.) Operations like vcat that allocate heap objects (such as ordinary arrays) must interact with the GC, which is blocking (although thread-safe), so they should be avoided in threaded loops. Your problem is a good candidate for multithreading; if you write out the operation as an explicit loop using immutable temporaries you should see dramatic improvement.

colintbowers · January 22, 2019, 6:05am

Thanks for responding. I think I understand what you mean about heap objects. My new implementation of the two test routines is:

function f1(N::Int, J::Int)
    x = [ collect(1:N) for j = 1:J ]
    nhalf = floor(Int, N/2)
    for j = 1:J
        for n = 1:nhalf
            (a, b) = (x[j][n], x[j][nhalf+n])
            x[j][n] = b
            x[j][nhalf+n] = a
        end
    end
    return x
end
function f1_thread(N::Int, J::Int)
    x = [ collect(1:N) for j = 1:J ]
    nhalf = floor(Int, N/2)
    Threads.@threads for j = 1:J
        for n = 1:nhalf
            (a, b) = (x[j][n], x[j][nhalf+n])
            x[j][n] = b
            x[j][nhalf+n] = a
        end
    end
    return x
end

But I’m still only seeing a small improvement:

julia> @btime f1(1000, 1000);
  1.432 ms (1001 allocations: 7.76 MiB)

julia> @btime f1_thread(1000, 1000);
  1.250 ms (1002 allocations: 7.76 MiB)

Have I misunderstood what you’re suggesting?

Thanks again, and apologies if I’m slow responding as I’m about to head home for the day.

Cheers,

Colin

Ralph_Smith · January 22, 2019, 2:24pm

Your functions spend most of their time building the arrays. For the loop itself, I see acceleration of about 2x with 3 threads.

I shouldn’t have said this was good for threading; with so little computation, memory bandwidth and/or cache competition mean that more threads don’t do any better than 2x (at least for me).

colintbowers · January 22, 2019, 11:51pm

Gah! Rookie mistake.

Yep, I’m getting the same result as you now.

Thanks for walking me through it.

Cheers,

Colin

Topic		Replies	Views
Slower with threads Performance question	26	1166	August 6, 2022
Looking for advice on threading General Usage	6	1622	January 15, 2020
Same multi-threaded code, scaling observed only on some machines Performance	2	72	August 14, 2024
Overhead of `Threads.@threads` Performance question , multithreading	30	5340	March 13, 2021
Multi-threading on a 2 CPU system New to Julia multithreading	15	1060	February 2, 2023

Are certain operations just not good candidates for threading?

Related topics