It seems that the `sleep` function behavior when used in tasks running on differā¦ent threads is heavily impacted by work taking place on the main thread.
Given the documentation of `sleep` function you would expect that `@time sleep(1)` will result in approximately `1-second` being reported by the `@time` evaluator. In the same way, you would expect that the following will take about 10-seconds in total:
```julia
@time for _ in 1:10
sleep(1)
end
# Usually + MWE control (task1): 10 seconds +/- small value
# MWE context (current issue): 16+ seconds (can be altered by the period main thread is busy)
```
However, the above expectation is violated (**in a manner not explainable by the expected/usual variability** **in the** `sleep` and/or `@time` **behavior**) when `sleep` is called from tasks running on different threads and any of the following two conditions are met:
1. there is some work performed by the main thread, or `Libc.systemsleep(n)` is called on the main thread (showcased in my MWE)
2. the main thread is _free_, but many (~1000) spawned tasks call `sleep` concurrently (increasing the number of available threads results in worse results).
To give a clear example:
```julia
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
@spawn sleeper(1, 10, 1)
```
Output like this would look normal:
```
sleeper 1: 10.021883 seconds (53.10 k allocations: 3.561 MiB)
```
However, the function `sleeper(1, 10, 1)` running in tasks spawned on different threads than main, can output values like the one below (when main thread is busy):
```
sleeper 1: 16.963197 seconds (13.87 k allocations: 957.833 KiB, 0.13% compilation time)
```
Or if you make the main really busy for a longer period of time:
```
sleeper 1: 29.944492 seconds (13.87 k allocations: 957.802 KiB, 0.06% compilation time)
```
And this is not about some weird print to `stdout` competition between tasks (we also have a `control` function, `realworker`, not using `sleep` to confirm my statement).
All this might make more sense if you run the MWE provided below.
Important note: the current issue is to be interpreted in the context of Julia being started with multiple threads. For consistency, let's assume `julia -t 6,3 script.jl`.
### Main MWE follows:
```julia
using Base.Threads, Dates
@info "started with $(nthreads()) threads"
@info "interactive threadpool: $(nthreads(:interactive))"
@info "default threadpool: $(nthreads(:default))"
println(repeat("*", 40))
function mainspin(s, c; withyield=false)
@info "mainspin executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
counter = 0
while counter < c
t = now()
while (now() - t) < ms
r += rand()
end
counter += 1
if withyield
yield()
end
end
end
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
function realworker(s, c, id)
@info "worker $id executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
counter = 0
@time "worker $id" while counter < c
t = now()
while (now() - t) < ms
r += rand()
end
counter += 1
yield()
end
end
begin
# phase 1
@info "phase 1: control run - no main thread work"
# yields 10 times (total time ~10 seconds)
task1 = @spawn realworker(1.02, 10, 1) # 1.02 because weird @time printing overlap
# yields 10 times (total time ~10 seconds)
task11 = @spawn sleeper(1, 10, 1)
wait(task1)
wait(task11)
# we waited for task1 and task11 to finish
# phase 2
println(repeat("*", 40))
@info "phase 2: main thread work without yield"
# yields 10 times (total time ~10 seconds)
task2 = @spawn realworker(1, 10, 2)
# yields 10 times (total time ~10 seconds)
task22 = @spawn sleeper(1, 10, 2)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning
# while task2 and task22 are already running
mainspin(1, 7)
# main thread work does not impact the real worker
# total time ~10 seconds
wait(task2)
# main thread work heavily impacting the sleeper
# total time >15 seconds
wait(task22)
# phase 3
println(repeat("*", 40))
@info "phase 3: main thread work with yield"
# yields 10 times (total time ~10 seconds)
task3 = @spawn realworker(1, 10, 3)
# yields 10 times (total time ~10 seconds)
task33 = @spawn sleeper(1, 10, 3)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning with yield
# while task3 and task33 are running
mainspin(1, 7, withyield=true)
# main thread work does not impact
# the real worker - total time ~10 seconds
wait(task3)
# main thread work (with yield) still impacting the sleeper
# but we consistently get better times (total time ~13 seconds)
wait(task33)
end
```
Output with `julia -t 6,3 script.jl`:
```
[ Info: started with 6 threads
[ Info: interactive threadpool: 3
[ Info: default threadpool: 6
****************************************
[ Info: phase 1: control run - no main thread work
[ Info: worker 1 executing on thread 4
[ Info: sleeper 1 executing on thread 8
sleeper 1: 10.020176 seconds (51 allocations: 1.438 KiB)
worker 1: 10.199500 seconds (53.62 k allocations: 3.704 MiB, 0.36% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: worker 2 executing on thread 7
[ Info: sleeper 2 executing on thread 5
[ Info: mainspin executing on thread 1
worker 2: 9.999656 seconds (47.22 k allocations: 3.243 MiB, 0.24% compilation time)
sleeper 2: 16.974156 seconds (13.87 k allocations: 957.802 KiB, 0.14% compilation time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: worker 3 executing on thread 9
[ Info: sleeper 3 executing on thread 5
[ Info: mainspin executing on thread 1
worker 3: 9.999733 seconds (37.05 k allocations: 2.618 MiB, 0.23% compilation time)
sleeper 3: 13.979589 seconds (2.67 k allocations: 249.790 KiB, 0.16% compilation time)
```
It would be speculative on my part to derive hard conclusions about the root cause of this issue. There is a discourse topic that I open and some users are proposing some hypotheses - [feel free to check it out](https://discourse.julialang.org/t/bug-in-sleep-function-main-thread-work-affecting-sleep-duration-on-running-tasks/101911?u=algunion).
The underlying cause of the above issue seems to extend to various packages (for example, non-blocking listen from `HTTP.jl` responsivity is highly dependent on the main threat not doing work - and that is not caused by the direct `sleep` usage). Essential to mention that `HTTP.serve!/HTTP.listen!` is spawned to the `:interactive` threadpool.
#### Libc.systemsleep MWE version
<details>
<summary>The `Libc.systemsleep` version of the above MWE</summary>
```julia
using Base.Threads, Dates
@info "started with $(nthreads()) threads"
@info "interactive threadpool: $(nthreads(:interactive))"
@info "default threadpool: $(nthreads(:default))"
println(repeat("*", 40))
function mainspin(s, c; withyield=false)
@info "mainspin executing on thread $(threadid()) "
r = 0.0
counter = 0
while counter < c
Libc.systemsleep(s)
counter += 1
if withyield
yield()
end
end
end
function sleeper(s, c, id)
@info "sleeper $id executing on thread $(threadid()) "
counter = 0
@time "sleeper $id" while counter < c
sleep(s)
counter += 1
end
end
function libcworker(s, c, id)
@info "worker $id executing on thread $(threadid()) "
r = 0.0
counter = 0
@time "worker $id" while counter < c
Libc.systemsleep(s)
counter += 1
yield()
end
end
begin
# phase 1
@info "phase 1: control run - no main thread work"
# yields 10 times (total time ~10 seconds)
task1 = @spawn libcworker(1.02, 10, 1) # 1.02 because weird @time printing overlap
# yields 10 times (total time ~10 seconds)
task11 = @spawn sleeper(1, 10, 1)
wait(task1)
wait(task11)
# we waited for task1 and task11 to finish
# phase 2
println(repeat("*", 40))
@info "phase 2: main thread work without yield"
# yields 10 times (total time ~10 seconds)
task2 = @spawn libcworker(1, 10, 2)
# yields 10 times (total time ~10 seconds)
task22 = @spawn sleeper(1, 10, 2)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning
# while task2 and task22 are already running
mainspin(1, 7)
# main thread work does not impact the real worker
# total time ~10 seconds
wait(task2)
# main thread work heavily impacting the sleeper
# total time >15 seconds
wait(task22)
# phase 3
println(repeat("*", 40))
@info "phase 3: main thread work with yield"
# yields 10 times (total time ~10 seconds)
task3 = @spawn libcworker(1, 10, 3)
# yields 10 times (total time ~10 seconds)
task33 = @spawn sleeper(1, 10, 3)
# we wait 5 seconds before starting to spin
# the main thread
sleep(5)
# now we start spinning with yield
# while task3 and task33 are running
mainspin(1, 7, withyield=true)
# main thread work does not impact
# the real worker - total time ~10 seconds
wait(task3)
# main thread work (with yield) still impacting the sleeper
# but we consistently get better times (total time ~13 seconds)
wait(task33)
end
```
Output:
```
[ Info: started with 6 threads
[ Info: interactive threadpool: 3
[ Info: default threadpool: 6
****************************************
[ Info: phase 1: control run - no main thread work
[ Info: worker 1 executing on thread 4
[ Info: sleeper 1 executing on thread 5
sleeper 1: 10.021097 seconds (50 allocations: 1.406 KiB)
worker 1: 10.202331 seconds (7.16 k allocations: 527.908 KiB, 0.89% compilation time)
****************************************
[ Info: phase 2: main thread work without yield
[ Info: sleeper 2 executing on thread 8
[ Info: worker 2 executing on thread 6
[ Info: mainspin executing on thread 1
worker 2: 10.001909 seconds (12.27 k allocations: 864.121 KiB, 0.36% compilation time)
sleeper 2: 17.009272 seconds (28.83 k allocations: 1.998 MiB, 0.21% compilation time)
****************************************
[ Info: phase 3: main thread work with yield
[ Info: sleeper 3 executing on thread 5
[ Info: worker 3 executing on thread 7
[ Info: mainspin executing on thread 1
worker 3: 10.001954 seconds (2.62 k allocations: 246.360 KiB, 0.28% compilation time)
sleeper 3: 13.026204 seconds (30.55 k allocations: 2.179 MiB, 0.22% compilation time)
```
</details>