@HanD, this one was particularly hard to sell as a legitimate bug/issue and with quite a few what ifs
that needed addressing.
Because many of the threading-related issues here, on discourse, are somehow telling the same story: somebody is not yet familiar enough with the relationship between @spawn
and threads
and gets into problems just by doing non-recommended stuff - so I think it was expected that people will initially seek to explain away my issue based on that kind of experience.
I mean, at first, even I was almost sure that I must be doing something wrong (I even hoped that I do): so I was the first that needed convincing (if you look at the number of edits - you can see that I initially started with the communication between tasks story which finally evolved into the final issue that you can see here).
But it seems there is hope - maybe it will be picked up soon and labeled as important enough to be addressed. One of the first answers (by @vchuravy) is the following:
Julia uses a libuv based event-loop under the hood. Processing certain things like Timers/IO depend on the event-loop being run regularly. (
sleep
uses a libuv timer under the hood.)Looking at
jl_process_events
it seems like the event loop is only run fromtid == 0
or when_threadedregion
is set.
jl_enter_threaded_region
is only called fromthreading_run
(which is the base function for@threads
).I am unsure why we still have this mechanism instead of allowing any thread to run the libuv event loop.
This is not just another bug that is encountered in some niche scenarios. Although it is rarely experienced at the extreme level you did, the responsivity of :interactive
threadpool (when using Timer/IO) is heavily impacted (and the issue gets worse with the number of available threads and spawned tasks). The :default
threadpool is impacted at least in the same manner - but I am focusing on the :interactive
because there is where your fast yielder or/and short-lived tasks are running - and you want them to be responsive, not battle for a lock on the main thread. It is one of those scenarios where allocating more computational resources will actually decrease the overall responsivity of all tasks while keeping the number of tasks contant (and you just want them to sleep ).
I am very curious how HTTP.jl
benchmarks will look after running on a Julia version with this bug/issue fixed.