Threads.at-spawn: idle threads when printing

When running multiple tasks which do some printing using Threads.@spawn, threads can become idle, I would like help understanding the reason and am looking for workarounds. Here is an example, assuming Threads.nthreads() == 8:

julia> function work(n, k, doprint=true)
           x = Xoshiro(n)
           s = 0.0
           u = (k÷10) * rand(x, 1:10)
           for i =1:k
               for _=1:5000000
                   s += rand(x)
               end
               if doprint && i == u
                   println("partially done: $n")
                   flush(stdout)
               end
           end
           s
       end
work (generic function with 2 methods)

julia> @time @sync for i=1:8
           Threads.@spawn work(i, 1000)
       end
partially done: 2
partially done: 1
partially done: 3
partially done: 4
partially done: 7
partially done: 8
partially done: 5
partially done: 6
 10.564103 seconds (1.71 k allocations: 99.959 KiB, 0.02% compilation time)

julia> @time @sync for i=1:8
           Threads.@spawn work(i, 1000, false)
       end
  5.945297 seconds (867 allocations: 52.394 KiB, 0.04% compilation time)

I thought this was because printing “yields”, and the scheduler is not always available because its thread is busy with another task; but replacing printing with just yield() runs as fast as when not printing. Though sleep(0.01) instead of printing also leads to idle threads.

One workaround seems to be to use ThreadPools.@tspawat, while leaving free the thread number 1:

julia> @time @sync for i=2:8
           @tspawnat i work(i, 1000)
       end
partially done: 2
[...]
  5.562996 seconds (1.64 k allocations: 96.764 KiB, 0.05% compilation time)

julia> @time @sync for i=1:7
           @tspawnat i work(i, 1000)
       end
partially done: 2
[...]
  9.414612 seconds (1.65 k allocations: 96.920 KiB, 0.03% compilation time)

But is it reliable, i.e. does leaving thread 1 free always help keeping the other threads busy?
I would prefer a solution not requiring @tspawnat, as keeping all n OS threads busy would probably involve having to start julia with n+1 threads.

Also: how does Threads.@threads do to overcome the problem?

julia> @time Threads.@threads for i=1:8
           work(i, 1000)
       end
partially done: 2
[...]
  5.622538 seconds (42.80 k allocations: 2.271 MiB, 0.57% compilation time)