Will `@info` corrupt `Threads.@threads for`?

My configuration is

julia> Threads.nthreads()
4

julia> Threads.nthreads(:default)
4

I have a function

function parallel_CG!(B, θ, β, μ, ν)
    for ite = ...
        ...
        for ba = ...
            sub_j_vec = ...
            println("before entering @threads, sub_j_vec = $sub_j_vec")
            Threads.@threads for i = eachindex(sub_j_vec)
                j = sub_j_vec[i]
                # @info "inside @threads" i j
                println("inside @threads, i = $i, j = $j")
            end
            error("No.1 can you see this error??")
        end
    end
end

when I call it, I get the expected result as follows

julia> parallel_CG!(B, θ, β, μ, ν)
before entering @threads, sub_j_vec = [1, 4, 2, 3]
inside @threads, i = 1, j = 1
inside @threads, i = 3, j = 2
inside @threads, i = 2, j = 4
inside @threads, i = 4, j = 3
ERROR: No.1 can you see this error??

However, if I delete the # symbol in the definition of the function, and re-run the function, I get this corrupted result

julia> parallel_CG!(B, θ, β, μ, ν)
before entering @threads, sub_j_vec = [1, 4, 2, 3]
┌ Info: inside @threads
│   i = 1
└   j = 2
┌ Info: inside @threads
│   i = 4
└   j = 2
┌ Info: inside @threads
│   i = 2
└   j = 4
inside @threads, i = 1, j = 4
inside @threads, i = 4, j = 4
inside @threads, i = 2, j = 4
┌ Info: inside @threads
│   i = 3
└   j = 4
inside @threads, i = 3, j = 4
ERROR: No.1 can you see this error??

My question is—why?

details see Julia crashes without reporting anything when I optimize a vector of models in parallel - #5 by WalterMadelim

Looks like there’s a variable named j in some surrounding scope. As a result, the binding j is captured and shared among all tasks. Renaming the inner j to something obviously unique, like _j_, should fix the problem (you’ll probably want to find a better name, this is just for demonstration purposes). Alternatively, adding local j in the inner scope should also work.

In other words, Julia’s default scoping rules correspond to using the nonlocal keyword in Python.

One way to avoid this pitfall is to factor out nested local scopes into separate functions. It could look something like this:

function inner_loop_body(sub_j_vec, i)
    j = sub_j_vec[i]
    # @info "inside @threads" i j
    println("inside @threads, i = $i, j = $j")
end

function parallel_CG!(B, θ, β, μ, ν)
    for ite = ...
        ...
        for ba = ...
            sub_j_vec = ...
            println("before entering @threads, sub_j_vec = $sub_j_vec")
            Threads.@threads for i = eachindex(sub_j_vec)
                inner_loop_body(i, sub_j_vec)
            end
            error("No.1 can you see this error??")
        end
    end
end

This way, it’s impossible for the inner j to shadow an outer one.

2 Likes

Looking at the full code from the other thread, the shadowed j is here:

            Threads.@threads for i = eachindex(sub_j_vec)
                j = sub_j_vec[i]
                ...
            end
            ...
            if Δ > COT
                j = sub_j_vec[ii]
                ...

It may look surprising that the binding of j would be shared between these two locations, since they’re in different blocks at the same level of nesting, and there’s no reference to j in any shared parent block. However, if does not introduce a new scope, so by assigning to j inside the if, you’re introducing a binding j that belongs to the surrounding scope, which is also the parent scope of the @threads loop. Hence, every time the tasks spawned by @threads assign to j, they’re all assigning to the shared binding inherited from the parent scope, rather than to their own local js.

2 Likes

Yes, I can reproduce

julia> v = [2, 3, 4, 1];

julia> for _ = 1:1
           Threads.@threads for i = eachindex(v)
               j = v[i]
               @info "see" i j
           end
       end
┌ Info: see
│   i = 3
└   j = 4
┌ Info: see
│   i = 4
└   j = 1
┌ Info: see
│   i = 2
└   j = 3
┌ Info: see
│   i = 1
└   j = 2

julia> for _ = 1:1
           Threads.@threads for i = eachindex(v)
               j = v[i]
               @info "see" i j
           end
           j = 9
       end
┌ Info: see
│   i = 4
└   j = 1
┌ Info: see
│   i = 1
└   j = 1
┌ Info: see
│   i = 2
└   j = 1
┌ Info: see
│   i = 3
└   j = 1

Therefore, the scope rule of Threads.@threads for is the same as the ordinary for?
(The former is a bit subtle, because it is not a loop, strictly speaking, e.g. it doesn’t allow a break.)

1 Like

I took a closer look with the aid of @macroexpand. It seems that the Threads.@threads will delegate the total work i = eachindex(v) to smaller blocks, according to how many physical threads are present. Therefore, indeed,

  • there is an ordinary for loop upon expanding. Hence, the scope rule is indeed similar.

It is indeed a loop, and it allows a break, but the break will halt the loop in only the task which executes it. If you need all tasks to stop you must do it yourself, e.g. with a check:

stopit = Threads.Atomic{Bool}(false)
Threads.@threads for i in 1:N
    stopit[] && break
    local status = work(i)
    status == 0 && (stopit[] = true)
end
2 Likes

One difference is that these for loops are inside the closures created by the tasks, so there’s an extra layer of capturing bindings. However, closure capture by design works out the same as any other local scope (much to the dismay of many who’ve seen their performance tanked by the infamous issue #15276).

1 Like

Seems that the tasks indexed by 1:N is not delegated linearly from small to large i. This is actually a likable behavior, so that one obviates the need to manually shuffle it. For example

julia> const stopit = Threads.Atomic{Bool}(false);

julia> Threads.@threads for i in 1:97
           stopit[] && break
           local status = 2i
           @info "i = $i, status = $status"
           status > 60 && Threads.atomic_or!(stopit, true)
       end
[ Info: i = 26, status = 52
[ Info: i = 1, status = 2
[ Info: i = 50, status = 100
[ Info: i = 74, status = 148

This is almost exactly what an early break needs----break as early as possible.

1 Like

The iteration range is split evenly between tasks. In your case, task 1 gets 1:25, task 2 26:49, task 3 50:73, and task 4 74:97. Thus, the first four iterations that are executed are 1, 26, 50, 74.

Note that Threads.@threads :greedy would start at 1, 2, 3, 4, as it delegates work dynamically, with each task starting the next available iteration every time it finishes the current one. It has more overhead, but is better suited when iterations have large and highly non-uniform workloads.

2 Likes

Thanks for pointing out. Therefore we currently have 3 options

  • :static (discouraged)
  • :dynamic (from julia 1.8)
  • :greedy (from julia 1.11, therefore being more advanced?)

I wonder if it would also be favorable to have an additional :randomgreedy option such that

Threads.@threads :randomgreedy for i = 1:97

resolves into

Threads.@threads :greedy for i = Random.shuffle(1:97)

. Would this be useful in practice? In my case, each i is a model which needs to be trained.

It’s just a different scheduler that was contributed at a later point. Better in some cases and worse in others. No reason to read more into it than that.

As a side note, the general recommendation for serious multithreaded work these days is to use the package OhMyThreads.jl and skip @threads and @spawn entirely. The API of OhMyThreads.jl has fewer pitfalls and is designed to nudge you towards better design patterns.

Almost certainly no—a proliferation of similar-but-not-identical options in any interface is just a recipe for confusion among users, while increasing the amount of code to test and maintain. Just use Random.shuffle yourself like you showed. (Caveat that I don’t make any decisions, this is just my own reaction.)

2 Likes

OhMyThreads.jl will even protect you from the accidental race condition you stumbled into here, by detecting the boxed closure and erroring out. See Boxed Variables · OhMyThreads.jl.

Well, if it is advisable, why not officially recommend/refer to it in Julia’s doc?

1 Like

I agree, and so do many others! It’s a fairly young package that’s become a widespread recommendation only recently (first commit January 2024; erroring on this race condition since March 2025), so it simply hasn’t happened yet. Julia 1.11 was too early, but there’s still time to get it done before 1.12 if someone makes a PR. See Move 1.12 threading news to threading section. Add reminder about threadid by IanButterworth · Pull Request #59251 · JuliaLang/julia · GitHub

I agree that OhMyThreads.jl is a good package which avoids many common pitfalls. Indeed, it should be in the standard library, or in a list of recommended packages which receives at least some attention from the core team. (Yes, I know … lack of people).

Likewise, there are some lower level synchronization fundamentals which are lacking from Threads, notably barriers and read/update locking (i.e. many reads/one update type of locking). These are also available in packages, and somewhat difficult to get right.

Packages are fine, but dependence on arbitrary packages is a pain. Suddenly they are without a maintainer (I have abandoned one or three R-packages in this way myself, simply because I changed job. One of them very heavily used), and are left to rot. And one day they won’t work.

I’m not sure whether I can write asynchronous programs with OhMyThreads.jl.

(At present, I only have a nascent idea of what I’m going to implement—I don’t have any prior knowledge and experience. But I’m sure that there is an asynchronous program style awaiting me, beyond the usage of @threads for)

I tend to agree with you, @WalterMadelim . While I can appreciate the automagical ways of OhMyThreads.jl, I prefer to control the concurrency myself. If a plain @threads for isn’t sufficient, I typically use locks and the tools in Threads, @spawn, @sync, Atomic, and atomic_...(). So either I do things like:

t1 = @spawn work1(...)
t2 = @spawn work2(...)
otherwork(...)
combine(fetch(t1), fetch(t2))

or, in the case of loops something like:

workstep = Atomic{Int}(1)
lk = ReentrantLock()
results = fill(0, problemsize)
@sync for i in 1:numthreads
    @spawn begin
        # ... allocate local storage ...
        while (j = atomic_add!(workstep, 10)) <= maxwork
            # ... do work on j:min(j+9, maxwork) with local storage ...
        end
        @lock lk begin 
            results .+= localresults
            # or other way to save the local results.
        end
    end
end

The main thing to keep in mind is that reading from or writing to separate Array elements is thread safe, but modification of the same element in different tasks (a[i] += 1) is not. Neither is updating of Dict or Set, or append!, push! and similar. These must be protected by a lock.

1 Like

Thanks for sharing your knowledge.

I formulated my aforementioned thoughts here How to write a master-subproblem iterative algorithm asynchronously?. I’m looking for an event-triggered fashion. I’m unsure it is expressed well.

I don’t think OhMyThreads.jl limits you in any way, but it may be more or less helpful depending on your usecase.

  • OhMyThreads.@tasks can be used as a drop-in replacement for Threads.@threads, but it includes the mentioned sanity check that would have caught your bug (it can also do a lot more when combined with other macros from OhMyThreads.jl)
  • OhMyThreads.@spawn is a drop-in replacement for Threads.@spawn, but it’s type stable; that is, the return type of fetch(OhMyThreads.@spawn expr) can be inferred by the compiler, unlike fetch(Threads.@spawn expr)
  • OhMyThreads.@one_by_one helps express typical locking patterns more succinctly. For example, @sgaure’s example above can be rewritten as
    workstep = Atomic{Int}(1)
    results = fill(0, problemsize)
    @tasks for i in 1:numthreads
        # ... allocate local storage ...
        while (j = atomic_add!(workstep, 10)) <= maxwork
            # ... do work on j:min(j+9, maxwork) with local storage ...
        end
        @one_by_one begin 
            results .+= localresults
            # or other way to save the local results.
        end
    end
    
    Admittedly not a huge simplification, the point is just that you can express this just fine using only the OhMyThreads.jl API, and thus get the benefit of the extra sanity checking/bug prevention from @tasks without losing any flexibility.

But perhaps the most significant benefit of OhMyThreads.jl for more advanced/custom asynchronous patterns is that, with the help of TaskLocalValues.jl under the hood, it provides more convenient abstractions over task_local_storage through the @local macro and WithTaskLocals type, improving the ergonomics of an arguably underused feature.

1 Like