Interpolation into Threads.@spawn macro fails for certain kinds of Structs

According to the docs for Threads.@spawn, you can interpolate an argument via $ and it will spawn the task with a copy of the variable. I have found that when I use a custom struct that use SparseArrays, that this behavior does not work as expected:

using SparseArrays, Base.Threads
struct C
    sa::SparseMatrixCSC{Float64,Int64}
end

c = C(zeros(Float64, (10,10)))
c.sa[1,1] = 5.0

function func(f::C)
    f.sa[1,1] = 80.0
    f.sa[2,1] = 10.0
    f
end


c_ = Threads.@spawn begin
    sleep(5)
    func($c)
end

c
#-- C(
#--   [1, 1]  =  5.0)

fetch(c_)
#-- C(
 # -- [1, 1]  =  80.0
 # --  [2, 1]  =  10.0)

c
#-- C(
#  -- [1, 1]  =  80.0
 # --  [2, 1]  =  10.0)

In spite of using $ to interpolate, the spawned task ends up mutating the original instance of the structure. This does not seem to be an issue with SparseArrays:

struct CS
    f1::Array{Int64,1}
    f2::Array{Float64, 1}
    f3::String
end

x = CS(zeros(Int64,5),zeros(Float64, 5),"yes")

function func!(x::CS)
    x.f1[1] = 80
    x.f2[1] = 55.0
    x
end

f_ = Threads.@spawn begin
    sleep(5)
    func!($x)
end

fetch(f_)
# CS([80, 0, 0, 0, 0], [55.0, 0.0, 0.0, 0.0, 0.0], "yes")

x
# CS([80, 0, 0, 0, 0], [55.0, 0.0, 0.0, 0.0, 0.0], "yes")

# happens even without structures:

x = [1,2,3]

function func!(x::Array{Int64,1})
    x[1] = 10
    x
end

f_ = Threads.@spawn begin
    sleep(5)
    func!($x)
end

fetch(f_)
# 3-element Array{Int64,1}:
 # 10
 # 2
 # 3
x
# 3-element Array{Int64,1}:
 # 10
 # 2
 # 3

My questions: 1) is this a bug (and should I open an issue on github), and 2) is there a workaround, or should I consider not use sparse arrays?

I could manually copy the sparse array and pass that as the arg:

struct C
    sa::SparseMatrixCSC{Float64,Int64}
end

c = C(zeros(Float64, (10,10)))
c.sa[1,1] = 5.0

function func(f::C)
    f.sa[1,1] = 80.0
    f.sa[2,1] = 10.0
    f
end


c_ = Threads.@spawn begin
    sleep(5)
    cc = C(copy(c.sa))
    func(cc)
end

c
#-- C(
#--   [1, 1]  =  5.0)

fetch(c_)
#-- C(
#  -- [1, 1]  =  80.0
 # --  [2, 1]  =  10.0)

c
#-- C(
#--   [1, 1]  =  5.0)

It’s unclear to me how different this copy approach is to what the interpolation was supposed to accomplish, and whether it may be incurring more overhead. Are there good ways to measure this? Not sure how to best measure this.

custom types are passed by reference so if you don’t want the values to be changed during function calls, pass a copy to the function or you can copy the argument and mutate its copy inside the function.

1 Like

That explanation makes sense, though it was not clear to me at all from the described behavior (which led me to think that the copy was being made for me). But what about this, where the same behavior is seen on an array? Is the array passed by reference too?

import Base.Threads.@async
x = [1,2,3]

function func(x::Array{Int64,1})
    x[1] = 90
    x
end

x_ = @spawn func($x)

fetch(x_)
"""
 3-element Array{Int64,1}:
 90
  2
  3
"""
x
"""
 3-element Array{Int64,1}:
 90
  2
  3
"""

My understanding is that without interpolation, there is a race condition whereby the object that x refers to might or might not change before the function runs. The interpolation “captures” the binding, guaranteeing that x refers to the object it was bound to at the time of the @spawn macro.

The code below (swapping out func(x) / func($x) demonstrates the difference.

import Base.Threads.@spawn

function func(A)
    A[1] = 90
    return A
end

const N = 100
is_same_object = Vector{Bool}(undef, N);

for i = 1:N
    x = [1,2,3]
    fx = @spawn func(x)
    #fx = @spawn func($x)
    x = [4,5,6]
    x2 = fetch(fx)
    is_same_object[i] = x2 === x
end

count(is_same_object)

If this interpretation is correct, then I think the documentation is misleading:

Values can be interpolated into @spawn via $, which copies the value directly into the constructed underlying closure. This allows you to insert the value of a variable, isolating the aysnchronous code from changes to the variable’s value in the current task.

array is also a struct type. it will be passed by reference also.

Your example makes sense, and does what I would have expected. A slight tweak to your example demonstrates what has surprised me:

import Base.Threads.@spawn

function func(A)
    A[1] = 90
    return A
end

const N = 100
is_same_object = Vector{Bool}(undef, N);

for i = 1:N
    x = [1,2,3]
    #fx = @spawn func(x)
    fx = @spawn func($x)
    x[2] = 80
    x2 = fetch(fx)
    is_same_object[i] = x2 === x
end

count(is_same_object)

Because x is not reassigned, just mutated, we seem to have re-introduced a race condition:

import Base.Threads.@spawn

function func(A)
    A[1] = 90
    return A
end

const N = 100
is_same_object = Vector{Bool}(undef, N);
is_x1_90 = Vector{Bool}(undef, N);
is_x2_90 = Vector{Bool}(undef, N);
for i = 1:N
    x = [1,2,3]
    #fx = @spawn func(x)
    fx = @spawn func($x)
    x[1] = 80
    x2 = fetch(fx)
    is_same_object[i] = x2 === x
    is_x1_90[i] = x[1] == 90
    is_x2_90[i] = x2[1] == 90
end

count(is_same_object) # 100
count(is_x1_90) # 83 for me; race condition
count(is_x2_90) # 83 for me; race condition

As suggested by @ppalmes, this seems to be because structs are passed by reference; when I use copy instead of $, I get the behavior I expected originally:

import Base.Threads.@spawn

function func(A)
    A[1] = 90
    return A
end

const N = 100
is_same_object = Vector{Bool}(undef, N);
is_x1_90 = Vector{Bool}(undef, N);
is_x2_90 = Vector{Bool}(undef, N);
for i = 1:N
    x = [1,2,3]
    #fx = @spawn func(x)
    fx = @spawn func(copy(x))
    x[1] = 80
    x2 = fetch(fx)
    is_same_object[i] = x2 === x
    is_x1_90[i] = x[1] == 90
    is_x2_90[i] = x2[1] == 90
end

count(is_same_object) # 0
count(is_x1_90) # 0 
count(is_x2_90) # 100

I may suggest clarification in the documentation on github; as written, it suggests to me at least that the interpolated value is in fact copied, not passed by reference for a structure.

This seems obvious now that you explain it but I very much appreciate your help.

My next question: I have an array (or other struct) in the main thread that is being constantly updated, and then I want to @spawn a process that only reads the array state, and does some calculations thereupon. But, I want it to read the full state as it was at the time the calculating process was spawned. Does $ do this for me? It appears not:


function func(A)
    return copy(A)
end

const N = 100
is_same_object = Vector{Bool}(undef, N);
is_x1_90 = Vector{Bool}(undef, N);
is_x2_90 = Vector{Bool}(undef, N);
for i = 1:N
    x = [1,2,3]
    #fx = @spawn func(x)
    fx = @spawn func($x)
    x[1] = 90
    x2 = fetch(fx)
    is_same_object[i] = x2 === x
    is_x1_90[i] = x[1] == 90
    is_x2_90[i] = x2[1] == 90
end

count(is_same_object) # 0
count(is_x1_90) # 100
count(is_x2_90) # 60 :/

Given that all I want to do is read, calculate, and discard any copy, a full copy seems like it adds unnecessary overhead. And yet, I cannot seem to find any way around this. I’m not really sure how to measure the performance hit of a copy operation (obviously depends on the size of the arrays). I have just been using the @time macro for REPL debugging, but I’ve seen some posts here mentioning Benchmarking.jl. Is that the recommended toolkit for measuring performance?

you can try using the @view of array to pass just part of the array to mutate.

No. As discussed above, it seems the documentation is misleading. Interpolation using $ copies the binding only.

If you want a snapshot at the time of spawning, then you’ll have to copy. Interpolation doesn’t do it, so you’ll have to explicitly copy.

Perhaps you can pass just the portion you need.

But if you need a copy then you’ll have to copy. There’s no running away from this, even if your original understanding of interpolation with $ was implemented, you would still carry the cost of copying ($ is just syntax after all). There’s no free copy if that’s what you want.

1 Like

Documentation issue raised: https://github.com/JuliaLang/julia/issues/36647

2 Likes