@spawn map(x -> f(x), collection) only returns Task (Done)

JohnS · October 31, 2022, 5:42pm

Hello!

I want to multiply a series of matrices by a vector, and it looks like @spawn is the fastest multithreading process. Unfortunately, the below MWE only returns Task Done.

x = collect(1:1:10)
y = x .+ x

vecmat = [[x+y for x in x, y in y] for i=1:length(x)]

newvecs = @spawn map(m -> m * x, vecmat)

The below works, but seems non-Julian.

x = collect(1:1:10)
y = x .+ x

vecmat = [[x+y for x in x, y in y] for i=1:length(x)]

newvecs = []

@spawn push!(newvecs, map(m -> m * x, vecmat))

Any thoughts on making @spawn and map return the value instead of Task Done in a Julian way?
Or an alternative solution that’s just as fast?

Thanks for your help, keen for the education!

bertschi · October 31, 2022, 6:02pm

@spawn is asynchronous and thus cannot immediately return a useful result. Instead, it spawns a new task doing the work and returns a handle to the task which can continue running.
In order to retrieve the result you can use fetch which blocks/waits until the task is done and fetches the result, i.e.,

task = @spawn map(m -> m * x, vecmat)
newvecs = fetch(task)

Alternatively, you could have your tasks write the result onto a channel and get them from there, see the manual section for more details.

JohnS · October 31, 2022, 6:21pm

Thanks for the explanation @bertschi, got it!

mikmoore · October 31, 2022, 9:20pm

I’ll often write a parallel map as something like

results = map(x -> @spawn foo(x), inputs) .|> fetch

So the map returns a collection of Tasks, which is then broadcasted into fetch to wait for and retrieve each of the results into a new collection.

JohnS · November 1, 2022, 12:49pm

Very elegant, cheers @mikmoore!

dave.f.kleinschmidt · November 1, 2022, 2:46pm

If you want parallel (multiprocess) map, check out pmap (discussed in this section of the manual). It provides a nice wrapper around this pattern that allows you to have some control of scheduling, batching, etc.

There’s also asyncmap which spawns LOCAL tasks (e.g., green threads) on the calling process, and is good for things like IO/network access where you want to make many requests that have “downtime” between initiation and completion where julia itself isn’t doing anything (e.g., load a bunch of small objects from s3).

And finally, there are a whole family of packages built on Transducers.jl which provide various abstractions for things like this that can be executed in parallel that are “execution engine agnostic” (can be done locally in one process, with multithreading, or with multiprocessing).

Topic		Replies	Views
Threads.@threads to return results Internals & Design threads	12	579	September 28, 2020
How to execute tasks in parallel in a for loop Performance parallel , multithreading , juliapro , optimization	27	2031	November 29, 2023
How to Maximize CPU Utilization - @spawn Assigning to Busy Workers - Use pmap Instead Julia at Scale parallel , distributed	17	3035	November 17, 2021
How to fetch a generator of @spawn correctly? General Usage question	3	246	January 30, 2023
Map and mapreduce with Threads General Usage question	6	4171	December 5, 2019

@spawn map(x -> f(x), collection) only returns Task (Done)

Related topics