Hanging in ijl_gc_collect() when using adopted threads

When using the MT mode of Geant4.jl the main thread hangs in an infinite loop in ijl_gc_collect(). I am using latest 1.9 version of Julia on MacOS.
The way it works is as follows:

  • The main thread (Julia REPL) calls C++ code that creates a number of worker threads. These are calling back Julia and they are all adopted into the Julia pool (I call explicitly jl_adopt_thread() but probably not necessary since the callback is done thought cfunction). So far, so good. Work is performed as expected.
  • After the run is completed, the adopted worker threads are put on wait (std::cv::wait(...)) and control is returned to the main thread (REPL). I do call jl_yield() at the end of the run and before entering the wait. The threads are waiting for a new run to eventually be started again.
  • At this moment, in a unpredictable manner, typically when generating some output, the REPL hangs in an infinite loop in `ijl_gc_collect() between addresses +256 and +264. My guess is that is looping around:
jl_gc_wait_for_the_world(gc_all_tls_states, gc_n_threads);
JL_PROBE_GC_STOP_THE_WORLD();

Is anything I can do you avoid this hanging? Many thanks in advance.

@maleadt do you think this problem I am having could be related/fixed by https://github.com/JuliaLang/julia/pull/49934? I am not really fit to understand the internals of Julia threading and GC.

No, that would manifest as a segfault when adopting. What might be happening here (as per my limited understanding of that part of Julia) is that when a thread starts GC, it waits for all other Julia threads to reach a safepoint. Your newly adopted threads however are not at a safepoint, yet they are blocked in std::cv::wait, causing other threads to hang when attempting to enter GC. You probably want to enter a GC safe region during that wait (by calling jl_gc_safe_enter), so that GC can run during it.