One thread concurrency: realized in REPL, but failed in a function

WalterMadelim · August 15, 2025, 2:05am

I did a test on the concurrency using @spawn, and find what the title suggests. I don’t know where is the twist. Here are my code

import Base.Threads.@spawn
Threads.nthreads() # use only one thread in this test

s(x) = sin(π * x/2);
c(x) = cos(π * x/2);
function cpu_intensive_work(N)
    a = 0.5                                                                                                          
    for i = 1:Int(N)
        a = a < 0.5 ? s(a) : c(a)
    end
    a
end
function cpu_eased_work()
    sleep(5)
end
function test()
    wtask = @spawn cpu_eased_work()
    t0 = time()
    ret = cpu_intensive_work(3e8) # about 4.5 seconds
    Δt1 = time() - t0
    wait(wtask)
    Δt2 = time() - t0 - Δt1
    Δt1, Δt2, ret
end

# First, we run this directly in REPL, the desired concurrency is achieved as expected
wtask = @spawn cpu_eased_work()
t0 = time();
ret = cpu_intensive_work(3e8);
Δt1 = time() - t0 # 4.722
wait(wtask)
Δt2 = time() - t0 - Δt1 # 0.258
# [comment] We can infer that concurrency happens since only 5 seconds is consumed

# Second, we run this inside a function, the concurrency is failed (why?)
test() # Notice the returned values: Δt1 = 4.436000108718872, Δt2 = 5.007999897003174
# [comment] Δt2 alone takes 5 seconds!

How to rewrite my test() function so that concurrency happens, just as the 6-line REPL experiment written above?

jling · August 15, 2025, 3:38am

julia> function test2()
           t0 = time()
           wtask = @spawn cpu_eased_work()
           wtask2 = @spawn cpu_eased_work()
           ret = cpu_intensive_work(3e8)
           wait(wtask)
           wait(wtask2)
           ret, time() - t0
       end

julia> test2()
(0.4768897454219627, 9.448176860809326)

it’s interesting multiple sleep() don’t stack up more time, so it looks like sleep() and sleep() can concurrent, but sleep and compute are sequential

Benny · August 15, 2025, 4:00am

The typical approach is to @spawn all the tasks, not leave work in the originating function call. To not cheat, I wait/fetch for the tasks to complete before leaving the originating function call. The order of which tasks to wait/fetch matters for some reason~~, and @sync is how to get around that~~.

julia> @time test() # reproduces example
  9.782188 seconds (857 allocations: 44.219 KiB, 0.05% compilation time)
(4.766000032424927, 5.019000053405762, 0.5316954226664747)

julia> function test2()
           wtask = @spawn cpu_eased_work() # sleep(5)
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           wait(wtask)
           fetch(wtask2)
       end
test2 (generic function with 1 method)

julia> @time test2()
  9.754124 seconds (5.46 k allocations: 275.234 KiB, 0.11% compilation time)
0.5316954226664747

julia> function test3()
           wtask = @spawn cpu_eased_work() # sleep(5)
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           ret = fetch(wtask2)
           wait(wtask)
           ret
       end
test3 (generic function with 1 method)

julia> @time test3()
  5.014386 seconds (5.46 k allocations: 275.047 KiB, 0.21% compilation time)
0.5316954226664747

julia> function test4()
       @sync begin # introduces let block!!
           wtask = @spawn cpu_eased_work() # sleep(5)
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           fetch(wtask2)
       end
       end
test4 (generic function with 1 method)

julia> @time test4()
  5.011570 seconds (5.47 k allocations: 275.469 KiB, 0.21% compilation time)
0.5316954226664747

WalterMadelim · August 15, 2025, 4:27am

Why? Is this stipulated somewhere? I don’t think this is a reasonable requirement—I think the coding style in my test() makes sense for being concise—and it can be realized in REPL (thereby raising the question of inconsistency as the title suggests).

I see, as shown in your test2() vs test3(). But I don’t think this is a reasonable behavior (is it?).
I restate this point to make it easier to be aware of

function test5(wait_cpu_task_first)
    sleep_task = @spawn cpu_eased_work() # sleep(5)
    cpu_task = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
    if wait_cpu_task_first
        wait(cpu_task)
        wait(sleep_task)
    else
        wait(sleep_task)
        wait(cpu_task)
    end
end

julia> @time test5(true)
  5.015835 seconds (5.45 k allocations: 274.453 KiB, 0.23% compilation time)

julia> @time test5(false)
  9.436463 seconds (13 allocations: 1.250 KiB)

Benny · August 15, 2025, 5:12am

The thing you’re missing, and which I still don’t really understand because I don’t do concurrency all that much, is that tasks don’t automatically begin execution at @spawn. They’re scheduled by the task manager and are thus considered to be running concurrently, but they only really begin if they reach a thread. So the 2 possible orders of the 2 tasks are:
a.

The scheduler puts cpu_eased_work() on the thread. sleep(5) sets the start time and yields to the scheduler (many of Julia’s I/O methods are designed to do this).
Now the scheduler puts cpu_intensive_work(3e8) on the thread, and it runs to completion for ~4.5 seconds.
The scheduler puts the preexisting cpu_eased_work() back on the thread, and it only lasts for ~0.5 seconds, a total of ~5 seconds since the start time. The total time of the 2 tasks is thus ~5 seconds.

b.

The scheduler puts cpu_intensive_work(3e8) on the thread, and it runs to completion for ~4.5 seconds.
The scheduler puts cpu_eased_work() on the thread. sleep(5) sets the start time and yields to the scheduler. Whatever the scheduler does next, this task ends 5 or more seconds later. The total time of the 2 tasks is thus 9.5 or more seconds.

In the original test with only one wtask, it scheduled the task then continued onward to run cpu_intensive_work(3e8), only yielding to the task at wait(wtask) for effectively the same timeline as (b). The same thing would occur in a let or begin block; on the other hand, evaluating lines as separate expressions in the global scope did actually yield at @spawn.

I’ll amend the previous comment too, but I was wrong about @sync handling it for us. We get the slower scenario (b) if the @spawn statements are swapped:

julia> function test4ii()
       @sync begin
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           wtask = @spawn cpu_eased_work() # sleep(5)
           fetch(wtask2)
       end
       end
test4ii (generic function with 1 method)

julia> @time test4ii()
  9.789642 seconds (5.47 k allocations: 275.734 KiB, 0.12% compilation time)
0.5316954226664747

Switching the @spawn lines doesn’t change test2 but it does slow down test3. As you demonstrate, switching which task to wait/fetch first also matters, which I inadvertently do in test2 vs test3. I don’t really know what the scheduler is doing, but maybe there is some principle.

The easiest adjustment is to make every scheduled task yield immediately after starting any timers so they all get a quick turn with the thread right after the originating function waits for them. We could change cpu_intensive_work, but @spawn and other task-making macros already make zero-argument closures, so:

julia> function test4iii()
       @sync begin
           wtask2 = @spawn begin yield(); cpu_intensive_work(3e8) end # about 4.5 seconds
           wtask = @spawn cpu_eased_work() # sleep(5)
           fetch(wtask2)
       end
       end
test4iii (generic function with 1 method)

julia> @time test4iii()
  5.018634 seconds (5.52 k allocations: 277.922 KiB, 0.23% compilation time)
0.5316954226664747

The change doesn’t help test5’s separate wait calls though.

sgaure · August 15, 2025, 11:38am

What I guess happens here is that the wtask in test() is scheduled on the main thread, which immediately becomes busy with cpu_intensive_work. The wtask won’t even begin its sleep before the cpu_intensive_work starts. The wtask won’t be rescheduled to another thread because it doesn’t even reach a point where it can be rescheduled.

This particular pattern, with some cpu-bound tasks, and some waiting tasks, is problematic. It’s mentioned here Multi-Threading · The Julia Language, in the context of garbage collection:

Compute-bound, non-memory-allocating tasks can prevent garbage collection from running in other threads that are allocating memory. In these cases it may be necessary to insert a manual call to GC.safepoint() to allow GC to run. This limitation will be removed in the future.

To alleviate this, you should @spawn all tasks, and wait for them (or do a yield() now and then in the cpu-bound task).

In the REPL case, the wtask will manage to go to sleep before cpu_intensive_work starts, and the wtask will be scheduled on a different thread when woken up. In general, tasks can only migrate to another thread when it interacts with the julia runtime (io, sleep, lock, allocation, etc), because tasks do cooperative scheduling. I.e. the julia runtime can’t pause a cpu-bound task and move it to another thread.

sgaure · August 15, 2025, 11:47am

I inserted a periodic yield():


function cpu_intensive_work(N)
    a = 0.5
    for i = 1:Int(N)
        iszero(i & 0xff) && yield()
        a = a < 0.5 ? s(a) : c(a)
    end
    a
end
julia> test()
(5.12465500831604, 2.1457672119140625e-6, 0.5316954226664747)

WalterMadelim · August 15, 2025, 12:35pm

advisable, I’ll remember.

I think it’s a very ad-hoc method (and involving hyperparameters, i.e. 0xff).

Well, maybe for my this particular case, the way out is still

julia> function test()
           wtask = @spawn cpu_eased_work() # sleep(5)
           ctask = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           t0 = time()
           ret = fetch(ctask) # I wish to do this cpu task first
           Δt1 = time() - t0
           wait(wtask)
           Δt2 = time() - t0 - Δt1
           Δt1, Δt2, ret
       end;

julia> @time test()
  5.012959 seconds (210.19 k allocations: 9.355 MiB, 2.73% compilation time)
(4.603999853134155, 0.4179999828338623, _)

PS I’ve formed an understanding of how nontrivial implementing concurrency in julia is .

jakobnissen · August 15, 2025, 3:46pm

On a more general level, when running concurrent Julia code, you should never write code that runs for a significant amount of time and doesn’t yield. The problem in this thread is just one of the many bad things that can follow from not yielding.

The reason is that Julia uses cooperative scheduling. Each thread yields itself, and is not preempted by the scheduler.

I recommend not trying to game the scheduler and e.g. try to make sure that one task is run before an other. Instead, always, always make your tasks yield regularly to allow the scheduler to do its job.

I agree it’s a bit of a sharp edge, API wise.

Benny · August 15, 2025, 11:40pm

The 5 seconds actually suggests the sleep-task was started first. I still don’t know how @spawn/wait/fetch affects when scheduled tasks gets onto threads, and I don’t even know if that’s even supposed to be consistent. As I mentioned, my lack of need for and inexperience with concurrency explains a lot of it, but some other things I don’t really know:

does @spawn/@async put tasks into the queue in a order, and does that order also determine which get dibs when the scheduler looks for a task to run on a free thread?
when I wait/fetch a task, does it yield to the whole task queue or run/restart the input task first?
what does yielding and blocking really do? “Tell the scheduler to run another task” and “stop this task until a condition is met” seems clear in isolation, but it doesn’t say much about how the scheduler handles this tasks among the other queued tasks or what’s supposed to happen when you do either from outside of a task (e.g. sleep obviously delays more from a non-task than a task).
What methods yield and don’t yield? yield, sleep vs Libc.systemsleep, and print/println are well-known, but there’s usually not much indication. A print that doesn’t yield would actually be very helpful in visualizing progress of concurrent tasks for these sort of educational experiments.
Some “settings” functions like g/set_zero_subnormals can be documented to “affect the current thread”. Does this mean they’re unsafe for task migration like threadid, and how would we accomplish the same effect in a migrate-able task?

I’m sure I could eventually figure this out if I put my mind to it, but the manual isn’t the best 101 source because it throws out a lot of generic terms without explaining them, and other sources may use these terms to mean different things or different terms to mean the same things (concurrent vs parallel, task vs coroutine vs green thread vs thread vs process), which is understandable because they may not be dealing with the same contexts (Julia, BLAS, and the OS have their own “threads” and I wish I understood it better). I know people have mentioned that the manual needs to be updated to recommend @spawn over @async, but I think it generally needs to be rewritten to be more informative to people starting from scratch. As much as it helps, making experiments like this to discover how concurrency works is time-consuming, error-prone, and doesn’t clarify which behaviors are actually reliable.

WalterMadelim · August 16, 2025, 1:09am

I agree what Benny said (I’ve also expressed many times my opinion about julia’s documentation—they are not as effective as its code).

I don’t have enough knowledge about what is a cpu-intensive work, therefore I devised the sin-cos one in my first post in this topic.
Now I speculate that an easier (perhaps the easiest) cpu-intensive work can be

function cpu_intensive_sleep(Δt)
    t0 = time()
    while time() - t0 < Δt
    end
end

Am I correct?
I further did a test using the newly proposed cpu_intensive_sleep function above

import Base.Threads.@spawn
Threads.nthreads() # 1

function test_mix(Δt, spawn_noncpu_first, wait_early_spawned_first)
    t0 = time()
    if spawn_noncpu_first
        task1 = @spawn sleep(Δt)
        Δt1 = time()-t0
        task2 = @spawn cpu_intensive_sleep(Δt)
    else
        task1 = @spawn cpu_intensive_sleep(Δt)
        Δt1 = time()-t0
        task2 = @spawn sleep(Δt)
    end
    Δt2 = time()-t0 -Δt1
    if wait_early_spawned_first
        wait(task1)
        Δt3 = time()-t0 -Δt1 -Δt2
        wait(task2)
    else
        wait(task2)
        Δt3 = time()-t0 -Δt1 -Δt2
        wait(task1)
    end
    Δt4 = time()-t0 -Δt1 -Δt2 -Δt3
    Δt1, Δt2, Δt3, Δt4
end


julia> test_mix(2, true, true)
(0.0, 0.0, 4.016000032424927, 0.0)

julia> test_mix(2, true, false) # ✅ only this is the desired case
(0.0, 0.0, 2.009000062942505, 0.0)

julia> test_mix(2, false, true)
(0.0, 0.0, 2.0, 2.0169999599456787)

julia> test_mix(2, false, false)
(0.0, 0.0, 4.017999887466431, 0.0)

Benny · August 16, 2025, 3:03am

That reaches a different concept: busy waiting, basically just making the processor do otherwise pointless work until a condition is met. The risk is the compiler may decide to optimize away everything for a trivial result, but you can at least time it on your machine. If you just want to halt a Julia process or task without yielding to another task, Libc.systemsleep lets eligible operating systems do that exact job without using more energy on the work. However, a busy processor is less liable to be interrupted by other processes, and staying in the same process could be more precise than sleeping and waking, especially when considering hardware timer precision. For concurrency, such precision is overkill; sleeping tasks still have to wait for an available thread after their input time already passed, and that delay is a small price for our tasks taking turns on threads over time.

WalterMadelim · August 16, 2025, 8:37am

It takes me some time to understand what this means. If let me restate, maybe

Julia uses cooperative scheduling—each thread will not be pre-empted automatically by the scheduler. Therefore, if needed, you should let each thread yield itself.

(well, the wording “pre-empt” is quite obscure for a noob at first. In my understanding, a similar verb could be “obstruct” (okay?).)

Well, the word “thread” is even more tricky and complex. Can anyone share a conclusion how many meanings are there for the word “thread”? Like OS thread, julia thread, green thread… is there a thread in the hardware sense?? The concepts are very complicated.

sgaure · August 16, 2025, 9:00am

You can use

@ccall printf("I'm thread %d\n"::Cstring; threadid()::Cint)::Cint

sgaure · August 16, 2025, 9:38am

There is a thread in the hardware sense. It’s often called hyperthread. The idea is that two instruction streams are run simultaneously on the same cpu core. They share some resources, e.g. a multiply can be done in one of the threads simultaneously with an xor in the other thread. However, hardware is anyway very parallel nowadays. Even a single instruction stream (e.g. a julia program) can be executed partly in parallel and out of order. Both forks in an if-then-else can be executed simultaneously, awaiting evaluation of the if-condition, then the irrelevant one is discarded. Etc. etc.

An OS-thread is the unit of execution of the OS. Unless running in a real time OS, OS-threads can be pre-empted by the OS, i.e. put forcefully to sleep, and the OS will continue with another thread, or with its own urgent business (e.g. serving a physcal interrupt). An OS will receive a physical timer-interrupt every 0.01 or 0.001 second, loop through the running threads and might pause one thread and continue another. That’s “time sharing”.

A julia thread is, at least for now, an OS-thread. Which the OS controls. However, julia controls which julia task runs in the thread. But julia can’t forcefully pause a task and put another task to run on the thread. Rather, the running task must call yield() at some point. This will, at least conceptually, put the task in some julia queue, and some other (or the same) task can be started on the thread. A yield() is done inside julia io/sleep/lock routines, and can also be done manually in loops which do not call into any such julia routines (typically a tight loop).
Julia threads are persistent throughout the execution of a julia program, but julia tasks are created on the fly and queued for execution on some thread.

“Green thread” has been used for tasks created by @async, which can only run on the main thread, and then only when nobody else uses it. Typically it can be used for logging purposes when the main thread does various IO things, e.g. to report progress.

In C+±20 (probably elsewhere too), there’s a concept of “stackless thread” or “coroutine”, which is a very lightweight asynchronous animal with limited functionality, but with some uses.

When programming parallel algorithms in julia, it’s tasks which are important. You can spawn millions of them, and the julia scheduler will start one on each thread. When started, the task will run until it’s finished, or at some point it calls yield() (explicitly in user code, or implicitly by library functions).

The julia way of doing tasks, was inspired by an extension to C called “cilk” conceived of at MIT in the 1990’s. Cilk - Wikipedia. It’s unfortunate that the word “thread” is used in julia contexts, it’s not really about threads. In julia, threads are a static resource, used for running tasks.

WalterMadelim · August 16, 2025, 11:41am

The spawn keyword, when preceding a function call (spawn f(x)), indicates that the function call (f(x)) can safely run in parallel with the statements following it in the calling function.

This quote from the Cilk-Wikipedia Link above corresponds squarely to my idea above:

sgaure · August 16, 2025, 12:31pm

This pattern means the same in julia, but no spawning pattern guarantees concurrent execution. It depends on subtle details of julia’s scheduler. Scheduling workloads of unknown size is difficult. E.g. in linux a lot of work has gone into various scheduling policies, both for threads and io.

Julia has one scheduling policy whose details admittedly are a bit obscure (just like the typical OS-thread scheduler, be it in windows, macos or elsewhere). I sometimes use the pattern you’ve tried, i.e.

r1 = @spawn work(...)
r2 = morework(...)
r = fetch(r1)::Int + r2

with success, whereas in other cases it’s something like

r = sum(t -> fetch(t)::Int, [@spawn work(i) for i in alli])

where all the work is spawned. In neither case is there any guarantee that certain tasks actually run concurrently. The julia scheduler does as best as it can, sometimes its strategy isn’t optimal. Just like any scheduler, which in reality attempts to solve the knapsack-problem with unknown sizes. Batch-systems on supercomputers have the same problem.

Marius_Kruger · August 20, 2025, 12:20pm

I just want to mention that the number of threads can be specified on the julia commandline or by some function calls, and depending of your julia version there are different defaults eg for interactive threads.

Topic		Replies	Views
Lightweight tasks, Julia vs Elixir/OTP Performance multithreading	21	3837	February 27, 2020
Asynchronous tasks vs multi-threading New to Julia question , parallel-computing	16	460	August 10, 2025
What is julia doing with your threads? General Usage	23	1220	February 21, 2024
Number of threads available seems random (using @spawn) New to Julia parallel	19	1386	August 25, 2020
Confusion regarding Threads.@spawn performance General Usage	4	1085	July 19, 2020

One thread concurrency: realized in REPL, but failed in a function

Related topics