I did a test on the concurrency using @spawn, and find what the title suggests. I don’t know where is the twist. Here are my code
import Base.Threads.@spawn
Threads.nthreads() # use only one thread in this test
s(x) = sin(π * x/2);
c(x) = cos(π * x/2);
function cpu_intensive_work(N)
a = 0.5
for i = 1:Int(N)
a = a < 0.5 ? s(a) : c(a)
end
a
end
function cpu_eased_work()
sleep(5)
end
function test()
wtask = @spawn cpu_eased_work()
t0 = time()
ret = cpu_intensive_work(3e8) # about 4.5 seconds
Δt1 = time() - t0
wait(wtask)
Δt2 = time() - t0 - Δt1
Δt1, Δt2, ret
end
# First, we run this directly in REPL, the desired concurrency is achieved as expected
wtask = @spawn cpu_eased_work()
t0 = time();
ret = cpu_intensive_work(3e8);
Δt1 = time() - t0 # 4.722
wait(wtask)
Δt2 = time() - t0 - Δt1 # 0.258
# [comment] We can infer that concurrency happens since only 5 seconds is consumed
# Second, we run this inside a function, the concurrency is failed (why?)
test() # Notice the returned values: Δt1 = 4.436000108718872, Δt2 = 5.007999897003174
# [comment] Δt2 alone takes 5 seconds!
How to rewrite my test() function so that concurrency happens, just as the 6-line REPL experiment written above?
The typical approach is to @spawn all the tasks, not leave work in the originating function call. To not cheat, I wait/fetch for the tasks to complete before leaving the originating function call. The order of which tasks to wait/fetch matters for some reason, and @sync is how to get around that.
julia> @time test() # reproduces example
9.782188 seconds (857 allocations: 44.219 KiB, 0.05% compilation time)
(4.766000032424927, 5.019000053405762, 0.5316954226664747)
julia> function test2()
wtask = @spawn cpu_eased_work() # sleep(5)
wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
wait(wtask)
fetch(wtask2)
end
test2 (generic function with 1 method)
julia> @time test2()
9.754124 seconds (5.46 k allocations: 275.234 KiB, 0.11% compilation time)
0.5316954226664747
julia> function test3()
wtask = @spawn cpu_eased_work() # sleep(5)
wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
ret = fetch(wtask2)
wait(wtask)
ret
end
test3 (generic function with 1 method)
julia> @time test3()
5.014386 seconds (5.46 k allocations: 275.047 KiB, 0.21% compilation time)
0.5316954226664747
julia> function test4()
@sync begin # introduces let block!!
wtask = @spawn cpu_eased_work() # sleep(5)
wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
fetch(wtask2)
end
end
test4 (generic function with 1 method)
julia> @time test4()
5.011570 seconds (5.47 k allocations: 275.469 KiB, 0.21% compilation time)
0.5316954226664747
Why? Is this stipulated somewhere? I don’t think this is a reasonable requirement—I think the coding style in my test() makes sense for being concise—and it can be realized in REPL (thereby raising the question of inconsistency as the title suggests).
I see, as shown in your test2() vs test3(). But I don’t think this is a reasonable behavior (is it?).
I restate this point to make it easier to be aware of
function test5(wait_cpu_task_first)
sleep_task = @spawn cpu_eased_work() # sleep(5)
cpu_task = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
if wait_cpu_task_first
wait(cpu_task)
wait(sleep_task)
else
wait(sleep_task)
wait(cpu_task)
end
end
julia> @time test5(true)
5.015835 seconds (5.45 k allocations: 274.453 KiB, 0.23% compilation time)
julia> @time test5(false)
9.436463 seconds (13 allocations: 1.250 KiB)
The thing you’re missing, and which I still don’t really understand because I don’t do concurrency all that much, is that tasks don’t automatically begin execution at @spawn. They’re scheduled by the task manager and are thus considered to be running concurrently, but they only really begin if they reach a thread. So the 2 possible orders of the 2 tasks are:
a.
The scheduler puts cpu_eased_work() on the thread. sleep(5) sets the start time and yields to the scheduler (many of Julia’s I/O methods are designed to do this).
Now the scheduler puts cpu_intensive_work(3e8) on the thread, and it runs to completion for ~4.5 seconds.
The scheduler puts the preexisting cpu_eased_work() back on the thread, and it only lasts for ~0.5 seconds, a total of ~5 seconds since the start time. The total time of the 2 tasks is thus ~5 seconds.
b.
The scheduler puts cpu_intensive_work(3e8) on the thread, and it runs to completion for ~4.5 seconds.
The scheduler puts cpu_eased_work() on the thread. sleep(5) sets the start time and yields to the scheduler. Whatever the scheduler does next, this task ends 5 or more seconds later. The total time of the 2 tasks is thus 9.5 or more seconds.
In the original test with only one wtask, it scheduled the task then continued onward to run cpu_intensive_work(3e8), only yielding to the task at wait(wtask) for effectively the same timeline as (b). The same thing would occur in a let or begin block; on the other hand, evaluating lines as separate expressions in the global scope did actually yield at @spawn.
I’ll amend the previous comment too, but I was wrong about @sync handling it for us. We get the slower scenario (b) if the @spawn statements are swapped:
julia> function test4ii()
@sync begin
wtask2 = @spawn cpu_intensive_work(3e8) # about 4.5 seconds
wtask = @spawn cpu_eased_work() # sleep(5)
fetch(wtask2)
end
end
test4ii (generic function with 1 method)
julia> @time test4ii()
9.789642 seconds (5.47 k allocations: 275.734 KiB, 0.12% compilation time)
0.5316954226664747
Switching the @spawn lines doesn’t change test2 but it does slow down test3. As you demonstrate, switching which task to wait/fetch first also matters, which I inadverdently do in test2 vs test3. I don’t really know what the scheduler is doing, but maybe there is some principle.