Examples of when to use @async over @spawn?

I’m trying to figure out if there’s ever a good reason to use @async over Threads.@spawn anymore.

It seems like you want Threads.@spawn the vast majority of the time – it’s more composable, gives more freedom to the scheduler, and is usually higher performance.

In some languages, the recommendation is to use the equivalent of @async most of the time to avoid parallelization overhead. But the overhead of Threads.@spawn is quite small, to the point where you’ll generally see performance gains unless your tasks are <1ms.

I suppose if you had a case with really small tasks that needed to be concurrent, @async might be more efficient, but I’m struggling to come up with an example.

1 Like

@async and @spawn have the same cost. The difference is in the semantics and for new code there is no reason to use @async. We needed @spawn to not break existing code that depended on the precise semantics of @async

4 Likes

Perhaps, does it help?

1 Like

Thanks! Would you also say Threads.@threads is obsolete and we should use Threads.@spawn for new code?

@threads applies to for loops. @spawn applies to arbitrary code.

1 Like

@threads seems to have pretty different performance characteristics though:

n = Threads.nthreads()

function hash_lots(x)
    for i in 1:10_000_000
        x = hash(x)
    end
    sleep(.075)
    return x
end

@btime hash_lots(5)
>  151.253 ms

@btime Threads.@threads for i in 1:(4*n)
      hash_lots(i)
end
>   608.548 ms (606 allocations: 53.75 KiB)

@btime begin 
    @sync for i in 1:(4*n)
        Threads.@spawn hash_lots(i)
    end
end
>  381.235 ms (974 allocations: 85.27 KiB)

It looks like @threads doesn’t allow task switching, while @spawn does. So I’m wondering if it ever makes sense to use @threads over @spawn even when you have a for loop.

In the @spawn case you create 4*n tasks and in the @threads case, we internally create threadpoolsize() tasks. @threads processes work in batches.

So due to the sleep in hash_lots with many tasks we can overlap the execution, whereas @threads is can’t.

julia> @btime Threads.@threads for i in 1:(4*n)
             sleep(.075)
       end
  305.822 ms (342 allocations: 16.12 KiB)

julia> @btime begin
           @sync for i in 1:(4*n)
               Threads.@spawn sleep(.075)
           end
       end
  76.265 ms (588 allocations: 41.23 KiB)

julia> @btime begin
           @sync for i in 1:n
               Threads.@spawn for _ in 1:4; sleep(.075); end
           end
       end
  306.316 ms (349 allocations: 16.64 KiB)
1 Like

Note that this is only the case with the default scheduler. In 1.11, there will be the :greedy scheduler, which eagerly consumes pieces of work and doesn’t batch. It spawns up to threadpoolsize() tasks that iterate independently. The intended use is for non-uniform workloads.

1 Like

Even with the :greedy scheduler you would see the same “issue” in this code as with batching.

The available concurrency is threadpoolsize not n. So work that could be interleaved isn’t necessarily

1 Like