Finalizer triggers "task switch not allowed" when used under multi-threaded GC

Hi there:
I’m working on parameter identification using ModelingToolkit-based ODE models.
I construct a symbolic system using ModelingToolkit.jl and build a custom loss function,
which is optimized under multithreaded parallelism (using Threads.@threads).

I encountered a GC finalizer error when using ModelingToolkit.jl under multithreaded computation:

error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/rtutils.c:43
ijl_switch at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/task.c:635
try_yieldto at ./task.jl:948
wait at ./task.jl:1022
#wait#733 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
slowlock at ./lock.jl:157
lock at ./lock.jl:147 [inlined]
lock at ./lock.jl:230
lock at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:88 [inlined]
delete! at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:184 [inlined]
#1 at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:32
unknown function (ip: 0x14d6c3ba737c)
#11 at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:97
unknown function (ip: 0x14d6c3ba7242)
run_finalizer at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:303
jl_gc_run_finalizers_in_list at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:393
run_finalizers at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:439
run_finalizers at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:420 [inlined]
ijl_gc_collect at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:3915
maybe_collect at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:926 [inlined]
jl_gc_pool_alloc_inner at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:1319
ijl_gc_pool_alloc_instrumented at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:1377
ImmutableDict at ./dict.jl:774 [inlined]
macro expansion at /home/user/.julia/packages/Setfield/ZezIj/src/sugar.jl:198 [inlined]
setmetadata at /home/user/.julia/packages/SymbolicUtils/aooYZ/src/types.jl:964
toparam at /home/user/.julia/packages/ModelingToolkit/weYw6/src/parameters.jl:55
unknown function (ip: 0x14d6581375f2)
Initial at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/abstractsystem.jl:523
unknown function (ip: 0x14d633323842)
jl_apply at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_apply at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/builtins.c:831
default_toterm at /home/user/.julia/packages/ModelingToolkit/weYw6/src/variables.jl:221
check_index_map at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/index_cache.jl:482
parameter_index at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/index_cache.jl:429
is_parameter at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/index_cache.jl:418 [inlined]
is_parameter at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/abstractsystem.jl:176
unknown function (ip: 0x14d633328e96)
build_operating_point! at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/problem_utils.jl:609
unknown function (ip: 0x14d63333ca76)
#generate_initializesystem_timevarying#1229 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:85
generate_initializesystem_timevarying at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:46 [inlined]
#generate_initializesystem#1228 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:35 [inlined]
generate_initializesystem at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:32 [inlined]
#_#1086 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/problems/initializationproblem.jl:52
InitializationProblem at /home/user/.julia/packages/ModelingToolkit/weYw6/src/problems/initializationproblem.jl:20 [inlined]
#_#1085 at ./none:0 [inlined]
InitializationProblem at ./none:0
unknown function (ip: 0x14d63334b7ea)
#maybe_build_initialization_problem#912 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/problem_utils.jl:1118
maybe_build_initialization_problem at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/problem_utils.jl:1106
unknown function (ip: 0x14d63333f600)
remake_initialization_data at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:662
unknown function (ip: 0x14d63333947e)
#remake#771 at /home/user/.julia/packages/SciMLBase/LZvKA/src/remake.jl:237
unknown function (ip: 0x14d63331af2d)
remake at /home/user/.julia/packages/SciMLBase/LZvKA/src/remake.jl:214 [inlined]
loss at ./REPL[64]:4
#26 at ./REPL[76]:1 [inlined]
macro expansion at ./REPL[69]:25 [inlined]
#112#threadsfor_fun#13 at ./threadingconstructs.jl:253
#112#threadsfor_fun at ./threadingconstructs.jl:220 [inlined]
#1 at ./threadingconstructs.jl:154
unknown function (ip: 0x14d6333129ef)
jl_apply at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/task.c:1202
error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/rtutils.c:43
ijl_switch at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/task.c:635
try_yieldto at ./task.jl:948
wait at ./task.jl:1022
#wait#733 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
slowlock at ./lock.jl:157
lock at ./lock.jl:147 [inlined]
lock at ./lock.jl:230
lock at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:88 [inlined]
delete! at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:184 [inlined]
#1 at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:32
unknown function (ip: 0x14d6c3ba737c)
#11 at /home/user/.julia/packages/WeakValueDicts/dxukx/src/WeakValueDicts.jl:97
unknown function (ip: 0x14d6c3ba7242)
run_finalizer at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:303
jl_gc_run_finalizers_in_list at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:393
run_finalizers at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:439
run_finalizers at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:420 [inlined]
ijl_gc_collect at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:3915
maybe_collect at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:926 [inlined]
jl_gc_pool_alloc_inner at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:1319
ijl_gc_pool_alloc_instrumented at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/gc.c:1377
ImmutableDict at ./dict.jl:774 [inlined]
macro expansion at /home/user/.julia/packages/Setfield/ZezIj/src/sugar.jl:198 [inlined]
setmetadata at /home/user/.julia/packages/SymbolicUtils/aooYZ/src/types.jl:964
toparam at /home/user/.julia/packages/ModelingToolkit/weYw6/src/parameters.jl:55
unknown function (ip: 0x14d6581375f2)
Initial at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/abstractsystem.jl:523
unknown function (ip: 0x14d633323842)
jl_apply at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_apply at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/builtins.c:831
default_toterm at /home/user/.julia/packages/ModelingToolkit/weYw6/src/variables.jl:221
check_index_map at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/index_cache.jl:482
parameter_index at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/index_cache.jl:429
is_parameter at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/index_cache.jl:418 [inlined]
is_parameter at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/abstractsystem.jl:176
unknown function (ip: 0x14d633328e96)
build_operating_point! at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/problem_utils.jl:609
unknown function (ip: 0x14d63333ca76)
#generate_initializesystem_timevarying#1229 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:85
generate_initializesystem_timevarying at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:46 [inlined]
#generate_initializesystem#1228 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:35 [inlined]
generate_initializesystem at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:32 [inlined]
#_#1086 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/problems/initializationproblem.jl:52
InitializationProblem at /home/user/.julia/packages/ModelingToolkit/weYw6/src/problems/initializationproblem.jl:20 [inlined]
#_#1085 at ./none:0 [inlined]
InitializationProblem at ./none:0
unknown function (ip: 0x14d63334b7ea)
#maybe_build_initialization_problem#912 at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/problem_utils.jl:1118
maybe_build_initialization_problem at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/problem_utils.jl:1106
unknown function (ip: 0x14d63333f600)
remake_initialization_data at /home/user/.julia/packages/ModelingToolkit/weYw6/src/systems/nonlinear/initializesystem.jl:662
unknown function (ip: 0x14d63333947e)
#remake#771 at /home/user/.julia/packages/SciMLBase/LZvKA/src/remake.jl:237
unknown function (ip: 0x14d63331af2d)
remake at /home/user/.julia/packages/SciMLBase/LZvKA/src/remake.jl:214 [inlined]
loss at ./REPL[64]:4
#26 at ./REPL[76]:1 [inlined]
macro expansion at ./REPL[69]:25 [inlined]
#112#threadsfor_fun#13 at ./threadingconstructs.jl:253
#112#threadsfor_fun at ./threadingconstructs.jl:220 [inlined]
#1 at ./threadingconstructs.jl:154
unknown function (ip: 0x14d6333129ef)
jl_apply at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
start_task at /cache/build/tester-amdci4-12/julialang/julia-release-1-dot-11/src/task.c:1202

After tracing the dependency chain:

  • ModelingToolkit.jlSymbolicUtils.jlWeakValueDicts.jl

It appears that WeakValueDicts registers a finalizer that performs a lock() call
In Julia, it is not legal to task-switch from within a GC finalizer. Doing so results in this error.

I wonder Besides waiting for upstream fix, are there current best practices or safe coding patterns to:

  1. Avoid triggering GC finalizers under threaded workloads?

  2. Work with ModelingToolkit + SymbolicUtils in a thread-safe way?

Thanks for any guidance!

2 Likes

Yeah, it appears WeakValueDicts.jl is wrong: WeakValueDicts.jl/src/WeakValueDicts.jl at 5eee2b6efea1e82ddbbce7bea35ca92c32ba265f · twavv/WeakValueDicts.jl · GitHub is a typical toctou race, must use trylock.

Getting that fixed might be hard: Nobody touched that repo for 5 years, and the author’s public github activity suggests that they aren’t doing a lot of julia at the moment.

But still, you should file an issue and maybe a PR – if travis (twavv) is still active that can get it fixed, or the orphanage people can take it over, or SymbolicUtils will need to create a fork under their control and use that.

(also, that is embarassing for the julia ecosystem: WeakValueDict should not be an individual package, it’s functionality is too small to pay for the software supply chain hassle. The responsibility lies on ModelingToolkit / SymbolicUtils side for keeping their upstream lean)

2 Likes

If you’re not a package author but rather a normal user (i.e. you control your entire downstream), then you can fix it via e.g.

julia> using WeakValueDicts; a=WeakValueDict{Any,Any}(); t=typeof(a.finalizer).name.Typeofwrapper.parameters[1]; function (x::t)(k,v)
       wvd = x.wvd
       f = wvd.finalizer
       if trylock(wvd.lock)
       try delete!(wvd.ht, k) finally unlock(wvd.lock) end
       else finalizer( (v) -> f(k,v), v ) end
       end

[edit: monkey-patch code-snippet was wrong]

1 Like

Thank you for you tips!
I also found it may be a good way to avoid this issue by writing ODE initial value and parameters as Vector{Float64} instead of Pair{Num, Type}, as long as the old API of passing parameters as a vector is not deprecated.

#Build base ODE problem
const base_prob = ODEProblem(sys, u0, (tspan[1], tspan[end]), p)

#update parameter in loss function
function loss(p::AbstractVector{T}) where {T<:Real}
    # induces GC finalizer error when evaluating loss in parallel 
    #p_remake = SVector{27}(ntuple(i -> param_syms[i] => p[i], 27))::SVector{27, Pair{Num, Type}}
    
    #update parameters as vector 
    p_remake = SVector{27}(p)
    prob = remake(base_prob;p=p_remake, build_initializeprob = Val(false))
...
end

If you have no downstream (ie you want to compute stuff, not write a package for other people to compute stuff), then I really recommend using the monkey-patch until the issue gets fixed somewhere upstream, instead of contorting your code to maybe don’t trigger the bug.

Once upstream fixes the thing, removing the monkey-patch is very easy, while weird workarounds are likely to stay on as technical debt forever.

That being said, if your code change is good anyways and coincidentially manages to not trigger the race condition, all the better for you.

1 Like

Can you update your packages? I believe MTK has stopped using this package (and switched to GitHub - JuliaCollections/WeakCacheSets.jl)