Atomic much slower than @spawn-ed tasks

AdaemmerP · August 22, 2023, 9:33am

The Julia manual offers two solutions to avoid a race condition when using @threads, namely @spawn-ed tasks and atomic operations. The manual states that the atomic approach “may be more performant depending on the characteristics of the operations”. However, for this example, the atomic approach is significantly slower. What is the reason for this huge discrepancy? When should the atomic approach be used? The results below are based on using 4 threads:

function sum_single(a)
    s = 0
    for i in a
        s += i
    end
    s
end

function sum_multi_good(a)
    chunks = Iterators.partition(a, length(a) ÷ Threads.nthreads())
    tasks = map(chunks) do chunk
        Threads.@spawn sum_single(chunk)
    end
    chunk_sums = fetch.(tasks)
    return sum_single(chunk_sums)
end
@btime sum_multi_good(1:100_000_000)
2.217 μs (35 allocations: 2.56 KiB)
5000000050000000

function sum_atomic!(a)
    atomic = Threads.Atomic{Int}(0)
    Threads.@threads for id in a
        Threads.atomic_add!(atomic, id)
    end
    return atomic[]
end

@btime sum_atomic!(1:100_000_000)
760.684 ms (26 allocations: 2.33 KiB)
5000000050000000

oxinabox · August 22, 2023, 10:22am

There isn’t much more to it than:
Atomic operations are slow.
(just like locks are slow – and indeed a lot of locks are built on atomics).

So if you can write your solution to in a way that minimizes needing them you are almost always better to do so.

But sometimes you can’t write your problem in a way that avoids them.

You could imagine a variabt of your code which uses an atomic_add to do the chunk of sums.

algunion · August 22, 2023, 12:00pm

Your (manual) scenario here is in the category of non-multithreading is better anyway. Because there is no real computation taking place, you end up paying for the overhead without any real multithreading benefit.

There are scenarios where no flavor of multithreading is going to produce a benefit (or when multithreading is doing more harm than good).

When adding some real work, we can see the two approaches become indistinguishable (at least in this scenario):

using BenchmarkTools

function costly(x)
    z = 0.0
    for _ in 1:2_000_000
        z += rand()
    end
    x
end

function sum_single(a)
    s = 0
    for i in a
        s += costly(i)
    end
    s
end

function sum_multi_good(a)
    chunks = Iterators.partition(a, length(a) ÷ Threads.nthreads())
    tasks = map(chunks) do chunk
        Threads.@spawn sum_single(chunk)
    end
    chunk_sums = fetch.(tasks)
    return sum_single(chunk_sums)
end
@info "sum_multi_good: "
@btime sum_multi_good(1:10000)

function sum_atomic!(a)
    atomic = Threads.Atomic{Int}(0)
    Threads.@threads for id in a
        Threads.atomic_add!(atomic, costly(id))
    end
    return atomic[]
end

@info "sum_atomic!: "
@btime sum_atomic!(1:10000)

@info "sum_single: "
@time "regular: " sum_single(1:10000)

Running with julia -t 4 atomicplay.jl will produce (on my machine):

[ Info: sum_multi_good: 
  4.382 s (37 allocations: 2.61 KiB)
[ Info: sum_atomic!: 
  4.370 s (27 allocations: 2.36 KiB)
[ Info: sum_single: 
regular: : 16.596185 seconds

Oscar_Smith · August 22, 2023, 1:01pm

note that sleeping is a bad example because sleeping doesn’t take compute, so when your program sleeps it will instantly schedule another task to run on the core

algunion · August 22, 2023, 1:09pm

I don’t think there is a real difference in this scenario - but I changed the example to avoid sleep altogether.

Topic		Replies	Views
Manual reduction much faster than atomic_add with @threads Performance multithreading	7	1550	May 7, 2021
Use of Atomic variables in julia New to Julia question	11	4112	October 28, 2017
Multithreading in Julia General Usage multithreading	3	1774	May 8, 2017
Fastest way to run a for loop in parallel Performance multithreading , threads	4	5402	December 27, 2020
Does @asyc or Threads.@spawn affect performance Julia at Scale	2	574	May 8, 2020

Atomic much slower than @spawn-ed tasks

Related topics