Threads.@spawn weird ouput message

Hi everyone,

I am testing two parallel versions of my loop - one with @threads and one with @spawn. Weirdly, @spawn finishes in a fraction of the time @threads needs, makes much fewer allocations, and gives me this weird message that I cannot find an explanation for anywhere: Task (runnable) @0x00007efb776e3c70
The output of both options match (also with the non-parallelized one). Can anyone explain to me what is happening?

Two more side questions:

  1. Does anyone know how to get the functions to make less allocations? Everything that normally works returns errors ( using .= instead of = or using @view when accessing to_inter) when using it with interpolate.
  2. How can I set up cache_m in a less weird way? Trying to set it up as a matrix of the output type of interpolate somehow does not work.

Here is a MWE:

# Setup
using Interpolations, BenchmarkTools

ID = rand(1,3000)
to_inter = rand(100,100,3000)

# Setting up output matrix as a vector of interpolation elements (please excuse the weird way of setting it up)
cache_m = []
for i in eachindex(ID)
        if i == 1
                cache_m = [interpolate(to_inter[:,:,1], BSpline(Cubic(Line(OnGrid()))))]
        else
                cache_m = vcat(cache_m, [cache_m[1]])                
        end
end

cache_mt = cache_m
cache_ms = cache_m

# The actual functions. They only differ by their parallelization type. It computes interpolations for each ID.
function test_np!(to_inter, cache_m)
        for i = eachindex(ID)
                cache_m[i] = interpolate( to_inter[:,:,i], BSpline(Cubic(Line(OnGrid()))))
        end
end
function test_pt!(to_inter, cache_mt)
        Threads.@threads for i = eachindex(ID)
                cache_mt[i] = interpolate( to_inter[:,:,i], BSpline(Cubic(Line(OnGrid()))))
        end
end
function test_ps!(to_inter, cache_ms)
        Threads.@spawn for i = eachindex(ID)
                cache_ms[i] = interpolate( to_inter[:,:,i], BSpline(Cubic(Line(OnGrid()))))
        end
end

# Run first time to trigger compilation
test_np!(to_inter, cache_m)
test_pt!(to_inter, cache_mt)
test_ps!(to_inter, cache_ms)

# Time functions
@time test_np!(to_inter, cache_m) # 1.763476 seconds (592.91 k allocations: 1.035 GiB, 4.53% gc time)
@time test_pt!(to_inter, cache_mt) # 0.128616 seconds (498.39 k allocations: 1.033 GiB)
@time test_ps!(to_inter, cache_ms) # 0.000577 seconds (192 allocations: 360.953 KiB) Task (runnable) @0x00007f8fb697c2f0

# Check if output is the same
cache_m == cache_mt # true
cache_m == cache_ms # true

Thanks a lot!

That is just how a Task object is displayed in the REPL:

julia> t = Task(() -> println("hello"));

julia> t
Task (runnable) @0x00007fa23c081460

julia> schedule(t);
hello

julia> t
Task (done) @0x00007fa23c081460

Threads.@spawn creates and schedules a Task, but do not block until it is finished. Your function test_ps! thus returns the task you create, and the time is fast because it returns before the work is done. The fact you get the same result is probably just because the task is done when you are comparing. To wait for the work to finish you would write test_ps! as

function test_ps!(to_inter, cache_ms)
    t = Threads.@spawn for i = eachindex(ID)
        cache_ms[i] = ...
    end
    wait(t)
end

but that is a bit silly since it is just a complicated way of writing test_np!: wait(Threads.@spawn expr) is more or less equivalent to expr except it might run on another thread.

2 Likes

That makes sense. I forgot the @sync block.
Results look much more normal when changing the @spawn function to:

function test_ps!(to_inter, cache_ms)
        @sync begin
                Threads.@spawn for i = eachindex(ID)
                        cache_ms[i] = interpolate( to_inter[:,:,i], BSpline(Cubic(Line(OnGrid()))))
                end
        end
end

This gives

No parallelization: 1.206858 seconds (592.91 k allocations: 1.035 GiB, 4.98% gc time)
@threads:           0.126990 seconds (498.38 k allocations: 1.033 GiB)
@spawn:             1.395076 seconds (592.93 k allocations: 1.035 GiB, 11.19% gc time)

It’s still weird that it returns the task in REPL. And any ideas about the side questions? :slight_smile:

Thanks a lot!

By the way, I think you are not making the comparison you are trying to make. @spawn simply sends the expression occurring after it to a new thread. In your example a single thread runs the entire loop over i.
To actually use multithreading with @spawn here you should move it inside the loop, so that it spawns a task for every iteration.
In general, @spawn and threads should be of somewhat similar performance, although if you can write it using @threads that should usually be a bit more lightweight and use less allocations.

2 Likes

Oh, you got a point. Thanks for pointing that out! That also explains the output message

1 Like

The lines

don’t really test anything. They will always be true, because the three variables cache_m, cache_mt and cache_ms refer to the same single array, which is of course equal to itself.

The lines

don’t make a copy of the array that cache_m refers to, they simply make cache_mt and cache_ms refer to the same array.

1 Like

Thanks for the answer. I noticed that afterwards as well.

1 Like