What triggered the present experiment was my failure to find a reasonable solution for the issue raised by this topic.
It turned out that the issue was unrelated to the HTTP.jl
package and went deeper.
Now, the experiment (with nthreads() > 2
).
Important note: to avoid “what if” scenarios about the resource allocation to the running tasks when using 1-2 threads, I suggest running this with -t 3
or above. Also, for this MWE it doesn’t make any difference if the tasks are spawned on :interactive
or :default
. Variations like -t 12,6
-t 12
are also not impacting the showcased issue - so if you run the MWE with at least -t 3
, I think we can be on the same page when discussing the issue.
using Base.Threads, Dates
@info "started with $(nthreads()) threads"
@info "interactive threadpool: $(nthreads(:interactive))"
@info "default threadpool: $(nthreads(:default))"
println(repeat("*", 40))
function mainspin(s, c; withyield=false)
@info "mainspin executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
counter = 0
while counter < c
t = now()
while (now() - t) < ms
r += rand()
end
counter += 1
if withyield
yield()
end
end
end
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
function realworker(s, c, id)
@info "worker $id executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
counter = 0
@time "worker $id" while counter < c
t = now()
while (now() - t) < ms
r += rand()
end
counter += 1
yield()
end
end
begin
# phase 1
@info "phase 1: control run - no main thread work"
# yields 10 times (total time ~10 seconds)
task1 = @spawn realworker(1.02, 10, 1) # 1.02 because weird @time printing overlap
# yields 10 times (total time ~10 seconds)
task11 = @spawn sleeper(1, 10, 1)
wait(task1)
wait(task11)
# we waited for task1 and task11 to finish
# phase 2
println(repeat("*", 40))
@info "phase 2: main thread work without yield"
# yields 10 times (total time ~10 seconds)
task2 = @spawn realworker(1, 10, 2)
# yields 10 times (total time ~10 seconds)
task22 = @spawn sleeper(1, 10, 2)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning
# while task2 and task22 are already running
mainspin(1, 7)
# main thread work does not impact the real worker
# total time ~10 seconds
wait(task2)
# main thread work heavily impacting the sleeper
# total time >15 seconds
wait(task22)
# phase 3
println(repeat("*", 40))
@info "phase 3: main thread work with yield"
# yields 10 times (total time ~10 seconds)
task3 = @spawn realworker(1, 10, 3)
# yields 10 times (total time ~10 seconds)
task33 = @spawn sleeper(1, 10, 3)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning with yield
# while task3 and task33 are running
mainspin(1, 7, withyield=true)
# main thread work does not impact
# the real worker - total time ~10 seconds
wait(task3)
# main thread work (with yield) still impacting the sleeper
# but we consistently get better times (total time ~13 seconds)
wait(task33)
end
Outcome:
[ Info: started with 6 threads
[ Info: interactive threadpool: 6
[ Info: default threadpool: 6
****************************************
[ Info: phase 1: control run - no main thread work
[ Info: sleeper 1 executing on thread 12
[ Info: worker 1 executing on thread 8
sleeper 1: 10.021080 seconds (52.62 k allocations: 3.531 MiB)
worker 1: 10.199292 seconds (266 allocations: 12.484 KiB, 0.22% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: worker 2 executing on thread 7
[ Info: sleeper 2 executing on thread 10
[ Info: mainspin executing on thread 1
worker 2: 9.999581 seconds (48.15 k allocations: 3.303 MiB, 0.31% compilation time)
sleeper 2: 16.951009 seconds (13.87 k allocations: 957.833 KiB, 0.18% compilation time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: worker 3 executing on thread 8
[ Info: sleeper 3 executing on thread 12
[ Info: mainspin executing on thread 1
worker 3: 9.999368 seconds (36.94 k allocations: 2.611 MiB, 0.12% compilation time)
sleeper 3: 13.989446 seconds (2.67 k allocations: 249.790 KiB, 0.08% compilation time)
If the task does real work, the overall time needed to finish is as expected (real parallelism).
However, if sleep
is used and a task is expected to sleep N seconds, the N value is extended with a factor dependent on the main thread’s work. I advise running the MWE for more details.
Here is sleep
function documentation:
Block the current task for a specified number of seconds. The minimum sleep time is 1 millisecond or input of
0.001
.
Reality? The description should be approximately this:
Block the current task for the specified number of seconds - but add a penalty depending on the work done by the main thread.
That means that if we are doing the heavy lifting on the main thread while a task using sleep
is running… that task is basically halted.
For the results above, I used the following Julia versions: 1.8.5 (with small changes), 1.9.2, and 1.10.0-alpha1 - and ran the experiment with various interactive/regular threads combinations.
The initial topic was likely too complex, and it turns out that it is not an issue of communication between tasks - main thread work seems to be translated in running tasks halting when using sleep()
inside running tasks.
Thank you, @Benny, for adding more clarity to the MWE.
P.S. I added multiple MWEs in the thread discussion in reply to various hypotheses trying to explain the behavior. Some of them showcase the issue by keeping the main thread free and comparing the worker
vs. sleeper
behavior in a context with multiple spawned tasks (the equivalent in the real world would be the non-blocking HTTP.jl
server which spawns a new task for each new connection). HTTP.jl
-related MWE can also be found in the current discussion thread.