@threads vs @spawn

Bryan_So · December 3, 2020, 4:45am

Hi, would appreciate some feedback about this issue. I’m trying to understand the difference between @threads and @spawn.

I think the former is easy to understand. I have an experiment shuffling cards, many many times. See if royal flush appears. I have supplied --threads=8 in the command line. Thus the following loop processes 8 batches in parallel at any one time. It will finish only when all 20 batches complete.

    batches = 20
    @threads for i in 1:batches
        batch!(...)
    end

Indeed, that’s what it does.

But I read about @spawn first before I understand the simpler syntax of @threads. So I had written the loop this way in the beginning:

@sync for i in 1:batches
    Threads.@spawn batch!(...)
end

I thought it also does the same thing as above… Are they equivalent? This spawns 20 threads at once. I guess 8 of them will run at any one time. @sync should wait until they all completed.

Apparently performance-wise they are not the same. The second one doesn’t behave like the first. I can see in “top” the julia process started with 800% cpu (correct… 8 threads). But after completing two batches, CPU dropped down to 400%… then after a much longer time a couple more batches completed. Then CPU dropped down to 200%. Eventually CPU dropped down to 100% without any more batch completion. It’s obviously not what I intended. And the experiment ran too long I had to Ctrl-C it.

How come?

Thanks

Happy to supply source code (about 100 lines). Let me know if anyone wants to see it.

pbayer · December 3, 2020, 8:08am

In order to see what happens, I suggest two little test functions returning the thread load:

using .Threads

function threaded(batches)
    ret = zeros(Int, nthreads())
    @threads for i in 1:batches
        ret[threadid()] += 1
    end
    return ret
end

function spawned(batches)
    ret = zeros(Int, nthreads())
    @sync for i in 1:batches
        Threads.@spawn ret[threadid()] += 1
    end
    return ret
end

then:

julia> threaded(20)
8-element Array{Int64,1}:
 3
 3
 3
 3
 2
 2
 2
 2

julia> spawned(20)
8-element Array{Int64,1}:
 0
 8
 9
 1
 2
 0
 0
 0

If you want uniformly loaded threads for parallel computation, use @threads for ....

@spawn is more for randomly spawning concurrent tasks. The manual has for it:

Create and run a Task on any available thread.

There is no guarantee for balanced load. In a bad case you can get even:

julia> spawned(20)
8-element Array{Int64,1}:
  1
 13
  1
  1
  1
  1
  1
  1

Bryan_So · December 3, 2020, 6:06pm

Thanks a lot. I knew someone would know right away. This is an awesome language. So fast, so easy to write parallel threads, so many experts willing to help.

Thanks again.

cstjean · November 7, 2024, 8:04am

Sorry for the bump, but from this (super-enlightening) example, it sounds like @threads is just like @spawn, but more constrained. In that logic, it would seem like @spawn is generally preferrable, but I’m pretty sure that’s wrong, right?

eldee · November 7, 2024, 8:36am

I agree that @spawn (or Task) is indeed more flexible than @threads and provides more fine-grained control. But if the situation allows it, you can just stick with @threads.

Note that the performance difference does not seem to exist anymore. Sure, @pbayer’s example still shows less balanced distribution across threads, but it is also not performing any calculations, so that does not matter. For a computationally more intensive example, consider

using .Threads
using BenchmarkTools

function threaded(batches, A)
    ret = zeros(Int, nthreads())
    s = 0  # Do something with our calculations so they don't get compiled away
    @threads for i in 1:batches
        M = view(A, :, :, i)
        ret[threadid()] += 1
        s += sum(M * M')  # Note: race condition, so incorrect final s, but not important
    end
    return ret, s
end

function spawned(batches, A)
    ret = zeros(Int, nthreads())
    s = 0
    @sync for i in 1:batches
        @spawn begin
            M = view(A, :, :, i)
            ret[threadid()] += 1
            s += sum(M * M')
        end
    end
    return ret, s
end

batches = 20
A = rand(1000, 1000, batches)

# For the timings below nthreads() == 8, and BLAS.get_num_threads() is at its default of 4. 
# (Using BLAS.set_num_threads(1) approximately halves the execution time, both for threaded and spawned.)
@btime threaded($batches, $A)
#    222.530 ms (124 allocations: 152.59 MiB)
#  ([2, 3, 3, 3, 2, 3, 2, 2], 7.508622562098708e8)

@btime spawned($batches, $A)
#    216.786 ms (193 allocations: 152.60 MiB)
#  ([2, 3, 3, 2, 3, 2, 3, 2], 7.505509242284892e8)

sgaure · November 7, 2024, 8:37am

I think this is a bad example. Each task takes very little time, so typically a task is finished before the next one is ready to start, so they can better be run serially. To better model a real computation, you should ensure that it grabs the thread for some time, e.g. with a systemsleep.

function spawned(batches)
    ret = zeros(Int, nthreads())
    @sync for i in 1:batches
        Threads.@spawn begin 
            ret[threadid()] += 1
            Libc.systemsleep(0.001)
        end
    end
    return ret
end

giordano · November 7, 2024, 8:47am

Please don’t use threadid: PSA: Thread-local state is no longer recommended

eldee · November 7, 2024, 8:51am

While good advice in general, how else would you obtain the (initial) distribution of tasks across threads?

sgaure · November 7, 2024, 8:52am

For this particular case, to measure the distribution of tasks across threads, that advice does not apply? But I agree it’s a bad idea in general, since tasks may move between threads. They don’t in these particular model runs, but they may do in the real shuffling problem.

carstenbauer · November 7, 2024, 12:29pm

No, that’s not wrong. Generally speaking, @spawn is preferable. However, it doesn’t have a static option. That is, if you want a static task-thread assignment (i.e. no task migration), the only way to get it (without using packages or making ccalls) is through @threads :static.

Packages like OhMyThreads.jl try to provide a more consistent and configurable API. For example, @threads becomes @tasks (which is configurable) and there also is @spawnat.

Topic		Replies	Views
Limit number of @spawn'ed threads General Usage	13	2287	June 19, 2024
Examples of when to use @async over @spawn? Performance question	8	768	May 28, 2024
More threads, slower code, even if not spawning them Performance	19	820	January 29, 2022
Multithreading @threads inside @spawn Performance multithreading	2	746	May 4, 2021
Strange behavior of @spawn New to Julia	2	261	July 21, 2020

@threads vs @spawn

Related topics