I’m trying to figure out if there’s ever a good reason to use @async
over Threads.@spawn
anymore.
It seems like you want Threads.@spawn
the vast majority of the time – it’s more composable, gives more freedom to the scheduler, and is usually higher performance.
In some languages, the recommendation is to use the equivalent of @async
most of the time to avoid parallelization overhead. But the overhead of Threads.@spawn
is quite small, to the point where you’ll generally see performance gains unless your tasks are <1ms.
I suppose if you had a case with really small tasks that needed to be concurrent, @async
might be more efficient, but I’m struggling to come up with an example.
1 Like
@async
and @spawn
have the same cost. The difference is in the semantics and for new code there is no reason to use @async
. We needed @spawn
to not break existing code that depended on the precise semantics of @async
4 Likes
Thanks! Would you also say Threads.@threads
is obsolete and we should use Threads.@spawn
for new code?
@threads applies to for loops. @spawn applies to arbitrary code.
1 Like
@threads
seems to have pretty different performance characteristics though:
n = Threads.nthreads()
function hash_lots(x)
for i in 1:10_000_000
x = hash(x)
end
sleep(.075)
return x
end
@btime hash_lots(5)
> 151.253 ms
@btime Threads.@threads for i in 1:(4*n)
hash_lots(i)
end
> 608.548 ms (606 allocations: 53.75 KiB)
@btime begin
@sync for i in 1:(4*n)
Threads.@spawn hash_lots(i)
end
end
> 381.235 ms (974 allocations: 85.27 KiB)
It looks like @threads
doesn’t allow task switching, while @spawn
does. So I’m wondering if it ever makes sense to use @threads
over @spawn
even when you have a for loop.
In the @spawn
case you create 4*n
tasks and in the @threads
case, we internally create threadpoolsize()
tasks. @threads
processes work in batches.
So due to the sleep
in hash_lots with many tasks we can overlap the execution, whereas @threads
is can’t.
julia> @btime Threads.@threads for i in 1:(4*n)
sleep(.075)
end
305.822 ms (342 allocations: 16.12 KiB)
julia> @btime begin
@sync for i in 1:(4*n)
Threads.@spawn sleep(.075)
end
end
76.265 ms (588 allocations: 41.23 KiB)
julia> @btime begin
@sync for i in 1:n
Threads.@spawn for _ in 1:4; sleep(.075); end
end
end
306.316 ms (349 allocations: 16.64 KiB)
1 Like
Note that this is only the case with the default scheduler. In 1.11, there will be the :greedy
scheduler, which eagerly consumes pieces of work and doesn’t batch. It spawns up to threadpoolsize()
tasks that iterate independently. The intended use is for non-uniform workloads.
1 Like
Even with the :greedy scheduler you would see the same “issue” in this code as with batching.
The available concurrency is threadpoolsize
not n
. So work that could be interleaved isn’t necessarily
1 Like