Spawning threads boxes variable - spooky action at a distance?

Hi,

While multi-threading some code, I came across an odd case where spawning tasks leads to Julia boxing a variable in a different code section (leading to type instabilities, Any, bad performance, the whole shebang).

I managed to reduce it to this contrived MWE:

function kernel!(clusters, points, irange)
    # Pseudo-cluster assignment
    for i in irange[1]:irange[2]
        clusters[i] = rand(1:length(points))
    end
end

function mtbox(points, num_tasks)

    num_points = size(points, 2)
    clusters = similar(points, Int64, num_points)
    prev_clusters = similar(points, Int64, num_points)

    # Keep track of tasks spawned
    tasks = Vector{Task}(undef, num_tasks)

    for it in 1:50

        # Swap current and previous iteration's cluster assignments
        clusters, prev_clusters = prev_clusters, clusters

        for itask in 1:num_tasks
            # Compute element indices handled by this task
            per_task = (num_points + num_tasks - 1) ÷ num_tasks
            task_istart = (itask - 1) * per_task + 1
            task_istop = min(itask * per_task, num_points)

            # Launch task over computed index range
            tasks[itask] = Threads.@spawn kernel!(
                clusters,
                points,
                (task_istart, task_istop),
            )
        end

        for task in tasks
            wait(task)
        end
    end

    clusters
end


# Example usage
mtbox(rand(3, 10), 4)

Using Cthulhu.@descend mtbox(rand(3, 10), 4), we immediately see the problem (attached as a screenshot to highlight colours):

We get a clusters::Core.Box which leads to clusters, prev_clusters::Any = prev_clusters, clusters::Any.

However, if we remove the clusters, prev_clusters = prev_clusters, clusters line, everything becomes type-stable again:

In isolation, neither of these lead to type instabilities:

  • Swapping variable names with clusters, prev_clusters = prev_clusters, clusters.
  • Launching tasks using clusters as a function argument.

But together, we get this sort of spooky action at a distance where the types cannot be inferred in one part of the code due to another.

I do not have much experience with type inference - would someone know why this happens, and perhaps if there are any solutions to it?

Thank you,
Leonard

Here is my interpretation: Threads.@spawn will transform the code block you give it into an anonymous function, which is then scheduled for execution in a new Task. A side-effect of this is that variable clusters is captured inside the created closure, which causes the known performance gotcha described here:

https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured

In the case of your MWE, clusters is never reassigned while a task is running (kernel! mutates it in each task, and the swap reassigns it in a purely sequential part of the code, when no parallel task is running). So if I’m not mistaken it should be correct to use the let-block trick described in the Performance Tips above, which would fix inference:

However, you’ll have to check whether the same kind of conditions hold in your real code, otherwise you might have hard to debug issues.

2 Likes

That solves the problem, thank you! And thanks for the link, it makes sense that the variable is conservatively boxed in case it is mutated outside the running thread; in my case it is not, so the let-block works perfectly.