How to debug an error in "type inference"

I have encountered a very weird issue, and I need some help on how to diagnose it. It somehow seems to happen between my code being run (which kind of seems to work), but before the output is returned. I.e. the issue doesn’t seem to directly happen in the code itself.

At the core of it I have a function my_func which stalls (i.e. never completes) under certain circumstances. I modify my function so that it looks like this:

function my_func()
    println("Function started")

    # Here the function actually does its stuff.

    println("Almost finished")
    return "This is the output"
end

When it works I get an expected output:

julia> my_func()
Function started
Almost finished
"This is the output"

However, sometimes I just get stuck at:

julia> my_func()

i.e. the function have stalled (usually it runs under a second, but here I can wait hours and it won’t complete). However, if I press enough ctrl+C I get this:

julia> my_func()
WARNING: Force throwing a SIGINT
Internal error: during type inference of
my_func()
Encountered unexpected error in runtime:
InterruptException()
segv_handler at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/signals-unix.c:356
Function started
Almost finished
"This is the output"

which includes the actual full output. So the function seems to be able to run everything, but somehow it gets stuck in this “type inference” step. Now my question is, what is this, and how do I actually go about figuring out why it gets stuck here?

More details of when this actually appears

The circumstances for this are actually super weird. Essentially it involves three packages:

  1. ModelingToolkit.jl (MTK), which (1) Implements a structure ODESystem.
  2. StructuralIdentifiability.jl (SI), which (1) implements a function assess_identifiability and (2) contains an extension for SI + MTK which implements an overload to assess_identifiability when it is called with an MTK ODESystem input.
  3. Catlayst.jl, which (1) Implements a structure ReactionSystem, and (2) contains an extension for Catalyst + SI which implements an overload to assess_identifiability for when it is called with a Catalyst ReactionSystem input. This overload internally calls the SI+MTK extensions overload of assess_identifiability.

Next, my code basically looks like this:


using ModelingToolkit,  StructuralIdentifiability

function my_func()
    println("Function started")

    osys = make_ODESystem() # Creates an MTK `ODESystem`.
    StructuralIdentifiability.assess_identifiability(osys) # Calls the SI + MTK extensions `assess_identifiability` on the created `ODESystem`.

    println("Almost finished")
    return "This is the output"
end

my_func() # This call completes without problems.
using Catalyst
my_func() # This call stalls.

So essentially, the presence of Catalyst (and its SI + Catalyst extension, and the extensions assess_identifiability overload) somehow causes the original assess_identifiability call to stall.

Some additional notes:

  • The assess_identifiability call in my_func never passes the assess_identifiability call implemented in the Catalyst+SI extension.
  • Removing the assess_identifiability call within the Catalyst+SI extension’s assess_identifiability overload and everything is fine.
  • If I just call osys = make_ODESystem(); StructuralIdentifiability.assess_identifiability(osys); directly in code (i.e. not enclosing them in my_func and calling that) then everything is fine.

To me I just don’t see how the presence of code that is never actually touched can affect some other code so severally (i.e. I don’t even have the hint of a possible mechanism for this). However, maybe some more experienced people might know? Here is the related issue: ModelingToolkit extension completely freezing Julia · Issue #360 · SciML/StructuralIdentifiability.jl · GitHub

1 Like

I don’t know any of the packages involved, so I can’t offer a solution. Just a comment on this. When a package is loaded (via using or import), its methods are added to julia. This may in certain cases change which method instances are called by your program, even if you don’t access the package directly. The obvious case is if the loaded package defines something like
Base.exp(x::Float64) = (sleep(3600); return 0.0). It will cause all your calls to exp to wait for an hour, before returning 0.0. There are more subtle cases than this.

One of the things julia does when compiling your method is to try to figure out the types of all the variables in your code (and in the code you’re calling). This is necessary to produce efficient code. This is called, you guessed it, type inference. There appears to be some unfortunate things happening when using Catalyst together with the other packages, which causes the extreme tardiness in type inference. I have no idea what it can be. Someone familiar with the internals of those packages will have to look into it. Sometimes such problems can be alleviated by loading the packages in a different order (Like, using Catalyst first).

2 Likes

Thanks for the reply. In this case I am wrote the Catalyst package, and the StructuralIdentifiability author have also failed to understand it, which makes things even more of a mess!

Your example is nice, but in this case, I know the newly introduced function is not actually called. You are probably right about the type inference, unfortunately I really haven’t found much about it (or how to debug it).

You can attach to the process with GDB, do a Ctrl-C and then look through the call stack and try see where it is stuck.

has some general tips for using GDB with julia.

2 Likes

And you may take a look at @snoop_inference in SnoopCompile.jl. And you get some information if you start julia with julia --trace-compile=stderr. (See julia --help-hidden).

1 Like

Thanks a lot! I have tried the various things, but still a bit uncertain about how to interpret things, or exactly where things go wrong.

Using julia --trace-compile=stderr

Using `julia --trace-compile=stderr`

When I start Julia with --trace-compile=stderr I get this output when I run (and interrupt) the critical part:

output
julia> my_func()
precompile(Tuple{typeof(Base.hash), Bool})
precompile(Tuple{typeof(Base.getproperty), Base.FilteringRF{Catalyst.var"#74#75", Base.MappingRF{typeof(Catalyst.combinatoric_ratelaws), Base.BottomRF{typeof(Base.:(|))}}}, Symbol})
precompile(Tuple{typeof(Base.getproperty), Base.MappingRF{typeof(Catalyst.combinatoric_ratelaws), Base.BottomRF{typeof(Base.:(|))}}, Symbol})
precompile(Tuple{typeof(Base.getproperty), Base.BottomRF{typeof(Base.:(|))}, Symbol})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, Catalyst.Reaction{S, T} where T where S, Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{typeof(Core.checked_dims), Int64, Int64})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, Catalyst.ReactionComplex{Int64}, Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, Array{Pair{Int64, Int64}, 1}, Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{Type{Base.Generator{I, F} where F where I}, Graphs.SimpleGraphs.var"#1#3"{Int64}, Base.UnitRange{Int64}})
precompile(Tuple{typeof(Base.getproperty), Base.Generator{Base.UnitRange{Int64}, Graphs.SimpleGraphs.var"#1#3"{Int64}}, Symbol})
precompile(Tuple{typeof(Base._similar_shape), Base.Generator{Base.UnitRange{Int64}, Graphs.SimpleGraphs.var"#1#3"{Int64}}, Base.HasShape{1}})
precompile(Tuple{Type{Base.Generator{I, F} where F where I}, Graphs.SimpleGraphs.var"#2#4"{Int64}, Base.UnitRange{Int64}})
precompile(Tuple{typeof(Base.getproperty), Base.Generator{Base.UnitRange{Int64}, Graphs.SimpleGraphs.var"#2#4"{Int64}}, Symbol})
precompile(Tuple{typeof(Base._similar_shape), Base.Generator{Base.UnitRange{Int64}, Graphs.SimpleGraphs.var"#2#4"{Int64}}, Base.HasShape{1}})
precompile(Tuple{typeof(Base.Broadcast.broadcastable), Base.OneTo{Int64}})
precompile(Tuple{Base.var"##s128#279", Vararg{Any, 5}})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, Catalyst.ReactionSystem{V} where V<:(Catalyst.NetworkProperties{I, V} where V<:SymbolicUtils.BasicSymbolic{Real} where I<:Integer), Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{Type{Signed}, UInt64})
precompile(Tuple{Base.var"#325#329"{Tuple{Int64}}, Int64})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, Array{Pair{Any, Any}, 1}, Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{Type{NamedTuple{(:private,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{Type{NamedTuple{(:namespace,), T} where T<:Tuple}, Tuple{Bool}})
precompile(Tuple{Type{SymbolicUtils.Rewriters.Chain}, Tuple{typeof(Catalyst.diff_2_zero)}, Bool})
precompile(Tuple{typeof(SymbolicUtils.Rewriters.Prewalk), SymbolicUtils.Rewriters.Chain})
precompile(Tuple{Type{SymbolicUtils.Rewriters.Fixpoint{C} where C}, SymbolicUtils.Rewriters.Walk{:pre, SymbolicUtils.Rewriters.Chain, typeof(TermInterface.maketerm), false}})
^C^C^C^C^CWARNING: Force throwing a SIGINT
Internal error: during type inference of
my_func()
Encountered unexpected error in runtime:
InterruptException()
segv_handler at /cache/build/builder-demeter6-3/julialang/julia-release-1-dot-11/src/signals-unix.c:356
precompile(Tuple{AbstractAlgebra.var"#40#41"{AbstractAlgebra.WeakValueDict{Tuple{AbstractAlgebra.Ring, Array{Symbol, 1}, Symbol, Int64}, AbstractAlgebra.Ring}}, AbstractAlgebra.Generic.MPolyRing{Nemo.QQMPolyRingElem}})
precompile(Tuple{AbstractAlgebra.var"#40#41"{AbstractAlgebra.WeakValueDict{AbstractAlgebra.Ring, AbstractAlgebra.Ring}}, AbstractAlgebra.Generic.FracField{Nemo.QQMPolyRingElem}})
Function started
precompile(Tuple{typeof(ModelingToolkit.collect_ivs_from_nested_operator!), Base.Set{Any}, SymbolicUtils.BasicSymbolic{Real}, Type})
precompile(Tuple{typeof(ModelingToolkit.structural_simplify), ModelingToolkit.ODESystem})
precompile(Tuple{ModelingToolkit.var"##__structural_simplify#1439", Bool, Base.Pairs{Symbol, Bool, NTuple{4, Symbol}, NamedTuple{(:allow_symbolic, :allow_parameter, :conservative, :fully_determined), NTuple{4, Bool}}}, typeof(ModelingToolkit.__structural_simplify), ModelingToolkit.ODESystem, Nothing})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, ModelingToolkit.ConnectionSet, Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{typeof(Core.memoryref), GenericMemory{:not_atomic, ModelingToolkit.ConnectionElement, Core.AddrSpace{Core}(0x00)}})
precompile(Tuple{typeof(ConstructionBase.setproperties), ModelingToolkit.ODESystem, NamedTuple{(:systems,), Tuple{Array{Any, 1}}}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:checks,), Tuple{Bool}}, Type{ModelingToolkit.ODESystem}, UInt64, Array{Symbolics.Equation, 1}, SymbolicUtils.BasicSymbolic{Real}, Array{SymbolicUtils.BasicSymbolic{Real}, 1}, Array{SymbolicUtils.BasicSymbolic{Real}, 1}, Nothing, Base.Dict{Any, Any}, Array{Any, 1}, Array{Symbolics.Equation, 1}, Base.RefValue{Array{Symbolics.Num, 1}}, Base.RefValue{Any}, Base.RefValue{Any}, Base.RefValue{Array{Symbolics.Num, 2}}, Base.RefValue{Array{Symbolics.Num, 2}}, Symbol, String, Array{Any, 1}, Base.Dict{Any, Any}, Base.Dict{Any, Any}, Nothing, Nothing, Array{Symbolics.Equation, 1}, Nothing, Nothing, Nothing, Array{ModelingToolkit.SymbolicContinuousCallback, 1}, Array{ModelingToolkit.SymbolicDiscreteCallback, 1}, Array{Symbolics.Equation, 1}, Nothing, Nothing, Bool, Array{Any, 1}, Nothing, Nothing, Bool, Vararg{Nothing, 5}})
precompile(Tuple{typeof(Base.filter), ModelingToolkit.var"#488#503"{Nothing}, Array{ModelingToolkit.ConnectionSet, 1}})
precompile(Tuple{typeof(ModelingToolkit.domain_defaults), ModelingToolkit.ODESystem, Array{ModelingToolkit.ConnectionSet, 1}})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:simplify, :allow_symbolic, :allow_parameter, :conservative, :fully_determined), NTuple{5, Bool}}, typeof(ModelingToolkit.structural_simplify!), ModelingToolkit.TearingState{ModelingToolkit.ODESystem}, Nothing})
precompile(Tuple{ModelingToolkit.var"##complete#284", Bool, Bool, typeof(ModelingToolkit.complete), ModelingToolkit.ODESystem})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:split,), Tuple{Bool}}, typeof(ModelingToolkit.complete), ModelingToolkit.ODESystem})
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:checks,), Tuple{Bool}}, Type{ModelingToolkit.ODESystem}, UInt64, Array{Symbolics.Equation, 1}, SymbolicUtils.BasicSymbolic{Real}, Array{Any, 1}, Array{Any, 1}, Nothing, Base.Dict{Any, Any}, Array{Any, 1}, Array{Symbolics.Equation, 1}, Base.RefValue{Array{Symbolics.Num, 1}}, Base.RefValue{Any}, Base.RefValue{Any}, Base.RefValue{Array{Symbolics.Num, 2}}, Base.RefValue{Array{Symbolics.Num, 2}}, Symbol, String, Array{Any, 1}, Base.Dict{Any, Any}, Base.Dict{Any, Any}, Nothing, Nothing, Array{Symbolics.Equation, 1}, ModelingToolkit.Schedule, Nothing, Nothing, Array{ModelingToolkit.SymbolicContinuousCallback, 1}, Array{ModelingToolkit.SymbolicDiscreteCallback, 1}, Array{Symbolics.Equation, 1}, Nothing, Nothing, Bool, Array{Any, 1}, ModelingToolkit.TearingState{ModelingToolkit.ODESystem}, ModelingToolkit.Substitutions, Bool, Nothing, Nothing, Nothing, Nothing, ModelingToolkit.ODESystem})
precompile(Tuple{Type{NamedTuple{(:measured_quantities, :funcs_to_check), T} where T<:Tuple}, Tuple{Array{Symbolics.Num, 1}, Array{Symbolics.Num, 1}}})
┌ Info: System parsed into X' = -X*d + p
└ y1 = X
[ Info: Assessing local identifiability
[ Info: Assessing global identifiability
[ Info: Computing IO-equations
[ Info: Computed IO-equations in 0.000545494 seconds
[ Info: Computing Wronskians
[ Info: Computed Wronskians in 0.00050596 seconds
[ Info: Dimensions of the Wronskians [3]
[ Info: Ranks of the Wronskians computed in 7.912e-6 seconds
[ Info: Global identifiability assessed in 0.002211476 seconds
Almost finished
"This is the output"

I.e.

precompile(Tuple{Type{SymbolicUtils.Rewriters.Fixpoint{C} where C}, SymbolicUtils.Rewriters.Walk{:pre, SymbolicUtils.Rewriters.Chain, typeof(TermInterface.maketerm), false}})

is the last thing to be printed after Julia stalls, and

precompile(Tuple{AbstractAlgebra.var"#40#41"{AbstractAlgebra.WeakValueDict{Tuple{AbstractAlgebra.Ring, Array{Symbol, 1}, Symbol, Int64}, AbstractAlgebra.Ring}}, AbstractAlgebra.Generic.MPolyRing{Nemo.QQMPolyRingElem}})

is the first thing to be printed after I intrupt using ctrl + C. Does it mean that one of these two (which) is the precompilation culprit?

Using @snoop_inference

Using `@snoop_inference`

I tried to @snoop_inference the call (i.e. running using Catalyst directly and then

julia> tinf = @snoop_inference my_func();

I cannot really see anything special in it though

julia> tinf
InferenceTimingNode: 0.000105/0.000105 on Core.Compiler.Timings.ROOT() with 0 direct children

julia> staleinstances(tinf)
SnoopCompileCore.InferenceTiming[]

julia> print_tree(tinf, maxdepth=5)
InferenceTimingNode: 0.000105/0.000105 on Core.Compiler.Timings.ROOT() with 0 direct children

Not sure if the lack of info is due to be having to ctrl+C the process though. I.e. when U try the same without having loaded Catalyst (i.e. so I can run the function without stalling) there is quite a lot of output in the tinf.

using GDB

using GDB

I have to be honest, I just haven’t manage to get this to work. I cannot start gdb in Julia:

julia> gdb
ERROR: UndefVarError: `gdb` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
 [1] top-level scope
   @ :0
 [2] top-level scope
   @ none:1

I can start gdb directly in my terminal. Unfortunatley, I cannot actually start Julia from here:

my_dir$ gdb
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) julia
Undefined command: "julia".  Try "help".

I am still also a bit fundamentally confused how it is possible for Julia to actually complete stuff after I have interrupted things.

  • I.e. it is doing something that is required to run my code. However, it cannot complete this task.
  • If I interrupt whatever this is, it is actually able to run my program.
  • The only explanation is that, at the time I Julia have stalled, the program is already completed. If it that is the case, why is Julia bothering doing something?

(this probably requires me finding someone who knows stuff and can explain it to me in person though)

gdb is start from the command line:

gdb julia
...
...
(gdb) run
...
...
julia> 

The run command can take the arguments you normally start julia with, e.g. run -t 4 --startup=no etc.
When you interrupt the program you are returned to the (gdb) prompt, and can type where, to obtain a stack dump. gdb is very low level.

The list you get from --trace-compile says which precompilations have been started, but they run in parallel, so it’s not necessarily the last one which hangs. This may also be the reason things continue after interrupt, it might be the case that only one of the threads are interrupted, and then releases some resource which the others need. You can try starting julia with -t 1 to run serially (I think that works for precompilations too).

Have you tried running with julia 1.10.8?

I’m in deep water here, someone with more knowledge about the internals of julia compilation should have a look at this.