One thread concurrency: realized in REPL, but failed in a function

I did a test on the concurrency using @spawn, and find what the title suggests. I don’t know where is the twist. Here are my code

import Base.Threads.@spawn
Threads.nthreads() # use only one thread in this test

s(x) = sin(π * x/2);
c(x) = cos(π * x/2);
function cpu_intensive_work(N)
    a = 0.5                                                                                                          
    for i = 1:Int(N)
        a = a < 0.5 ? s(a) : c(a)
    end
    a
end
function cpu_eased_work()
    sleep(5)
end
function test()
    wtask = @spawn cpu_eased_work()
    t0 = time()
    ret = cpu_intensive_work(3e8) # about 4.5 seconds
    Δt1 = time() - t0
    wait(wtask)
    Δt2 = time() - t0 - Δt1
    Δt1, Δt2, ret
end

# First, we run this directly in REPL, the desired concurrency is achieved as expected
wtask = @spawn cpu_eased_work()
t0 = time();
ret = cpu_intensive_work(3e8);
Δt1 = time() - t0 # 4.722
wait(wtask)
Δt2 = time() - t0 - Δt1 # 0.258
# [comment] We can infer that concurrency happens since only 5 seconds is consumed

# Second, we run this inside a function, the concurrency is failed (why?)
test() # Notice the returned values: Δt1 = 4.436000108718872, Δt2 = 5.007999897003174
# [comment] Δt2 alone takes 5 seconds!

How to rewrite my test() function so that concurrency happens, just as the 6-line REPL experiment written above?

julia> function test2()
           t0 = time()
           wtask = @spawn cpu_eased_work()
           wtask2 = @spawn cpu_eased_work()
           ret = cpu_intensive_work(3e8)
           wait(wtask)
           wait(wtask2)
           ret, time() - t0
       end

julia> test2()
(0.4768897454219627, 9.448176860809326)

it’s interesting multiple sleep() don’t stack up more time, so it looks like sleep() and sleep() can concurrent, but sleep and compute are sequential

The typical approach is to @spawn all the tasks, not leave work in the originating function call. To not cheat, I wait/fetch for the tasks to complete before leaving the originating function call. The order of which tasks to wait/fetch matters for some reason, and @sync is how to get around that.

julia> @time test() # reproduces example
  9.782188 seconds (857 allocations: 44.219 KiB, 0.05% compilation time)
(4.766000032424927, 5.019000053405762, 0.5316954226664747)

julia> function test2()
           wtask = @spawn cpu_eased_work() # sleep(5)
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           wait(wtask)
           fetch(wtask2)
       end
test2 (generic function with 1 method)

julia> @time test2()
  9.754124 seconds (5.46 k allocations: 275.234 KiB, 0.11% compilation time)
0.5316954226664747

julia> function test3()
           wtask = @spawn cpu_eased_work() # sleep(5)
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           ret = fetch(wtask2)
           wait(wtask)
           ret
       end
test3 (generic function with 1 method)

julia> @time test3()
  5.014386 seconds (5.46 k allocations: 275.047 KiB, 0.21% compilation time)
0.5316954226664747

julia> function test4()
       @sync begin # introduces let block!!
           wtask = @spawn cpu_eased_work() # sleep(5)
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           fetch(wtask2)
       end
       end
test4 (generic function with 1 method)

julia> @time test4()
  5.011570 seconds (5.47 k allocations: 275.469 KiB, 0.21% compilation time)
0.5316954226664747

Why? Is this stipulated somewhere? I don’t think this is a reasonable requirement—I think the coding style in my test() makes sense for being concise—and it can be realized in REPL (thereby raising the question of inconsistency as the title suggests).

I see, as shown in your test2() vs test3(). But I don’t think this is a reasonable behavior (is it?).
I restate this point to make it easier to be aware of

function test5(wait_cpu_task_first)
    sleep_task = @spawn cpu_eased_work() # sleep(5)
    cpu_task = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
    if wait_cpu_task_first
        wait(cpu_task)
        wait(sleep_task)
    else
        wait(sleep_task)
        wait(cpu_task)
    end
end

julia> @time test5(true)
  5.015835 seconds (5.45 k allocations: 274.453 KiB, 0.23% compilation time)

julia> @time test5(false)
  9.436463 seconds (13 allocations: 1.250 KiB)

The thing you’re missing, and which I still don’t really understand because I don’t do concurrency all that much, is that tasks don’t automatically begin execution at @spawn. They’re scheduled by the task manager and are thus considered to be running concurrently, but they only really begin if they reach a thread. So the 2 possible orders of the 2 tasks are:
a.

  1. The scheduler puts cpu_eased_work() on the thread. sleep(5) sets the start time and yields to the scheduler (many of Julia’s I/O methods are designed to do this).
  2. Now the scheduler puts cpu_intensive_work(3e8) on the thread, and it runs to completion for ~4.5 seconds.
  3. The scheduler puts the preexisting cpu_eased_work() back on the thread, and it only lasts for ~0.5 seconds, a total of ~5 seconds since the start time. The total time of the 2 tasks is thus ~5 seconds.

b.

  1. The scheduler puts cpu_intensive_work(3e8) on the thread, and it runs to completion for ~4.5 seconds.
  2. The scheduler puts cpu_eased_work() on the thread. sleep(5) sets the start time and yields to the scheduler. Whatever the scheduler does next, this task ends 5 or more seconds later. The total time of the 2 tasks is thus 9.5 or more seconds.

In the original test with only one wtask, it scheduled the task then continued onward to run cpu_intensive_work(3e8), only yielding to the task at wait(wtask) for effectively the same timeline as (b). The same thing would occur in a let or begin block; on the other hand, evaluating lines as separate expressions in the global scope did actually yield at @spawn.

I’ll amend the previous comment too, but I was wrong about @sync handling it for us. We get the slower scenario (b) if the @spawn statements are swapped:

julia> function test4ii()
       @sync begin
           wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
           wtask = @spawn cpu_eased_work() # sleep(5)
           fetch(wtask2)
       end
       end
test4ii (generic function with 1 method)

julia> @time test4ii()
  9.789642 seconds (5.47 k allocations: 275.734 KiB, 0.12% compilation time)
0.5316954226664747

Switching the @spawn lines doesn’t change test2 but it does slow down test3. As you demonstrate, switching which task to wait/fetch first also matters, which I inadverdently do in test2 vs test3. I don’t really know what the scheduler is doing, but maybe there is some principle.