How to debug a non-deterministic StackOverflowError?

I’m trying to implement the Rhea C++ constraint solving algorithm in Julia as a learning project. I’m simply translating the original source as far as I’m able into something that makes sense in Julia.

I have implemented part of the unit tests and at the current state they fail with a StackOverflowError, but not every time. As the inputs are completely deterministic, I don’t know how that can happen.

As the code is already a couple hundred lines long and I’m not sure where the problem might be, I know it will be complicated to get help here, because I can’t post a non-working minimal example.

So first of all here’s the stacktrace:

(Rhea) pkg> test Rhea
   Testing Rhea
 Resolving package versions...
Test Summary: | Pass  Total
strength test |   11     11
Test Summary: | Pass  Total
variable test |   16     16
Test Summary:        | Pass  Total
linear expressions 1 |   13     13
Test Summary:        | Pass  Total
linear expressions 3 |    5      5
linear equation test: Error During Test at /Users/juliuskrumbiegel/dev/julia/Rhea/test/runtests.jl:116
  Test threw exception
  Expression: is_satisfied(x == y - 1)
  StackOverflowError:
  Stacktrace:
   [1] Dict(::Pair{variable{Float64},Float64}) at ./dict.jl:124
   [2] expression{variable{Float64}}(::variable{Float64}, ::Float64, ::Float64) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:14
   [3] expression{variable{Float64}}(::variable{Float64}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:12
   [4] ==(::variable{Float64}, ::variable{Float64}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/constraint.jl:52
   [5] isequal(::variable{Float64}, ::variable{Float64}) at ./operators.jl:123
   [6] ht_keyindex(::Dict{variable{Float64},Float64}, ::variable{Float64}) at ./dict.jl:290
   [7] haskey(::Dict{variable{Float64},Float64}, ::variable{Float64}) at ./dict.jl:546
   [8] sub!(::expression{variable{Float64}}, ::Rhea.term{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:153
   [9] sub!(::expression{variable{Float64}}, ::expression{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:147
   [10] -(::expression{variable{Float64}}, ::expression{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/linear_expression.jl:43
   [11] ==(::expression{variable{Float64}}, ::expression{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/constraint.jl:45
   ... (the last 8 lines are repeated 9998 more times)
   [79996] ==(::variable{Float64}, ::variable{Float64}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/constraint.jl:52
ERROR: Package Rhea errored during testing

So there seems to be a loop between the ==(::variable{Float64}, ::variable{Float64}) statements at [4] and [11]. That line from [4] looks like this:

==(v::variable, v2::variable) = linear_expression(v) == linear_expression(v2)

which should then lead to another function that I defined

==(first::linear_expression, second::linear_expression) = constraint(first - second, eq)

but this doesn’t appear next in the stack trace, which I don’t understand.
variable is defined like this:

mutable struct variable{T}
    p::T
end

expression like this:

mutable struct expression{T}
    constant::Float64
    terms::Dict{T, Float64}
end

and linear_expression like this:

const linear_expression = expression{variable{Float64}}

The code has basically lots of redefinitions of operators, so that something like

x = variable(3.0)
y = variable(4.0)
x == y

should not compare x and y but create a linear expression.
I’ll leave it at that for now because it is a lot already, I would be grateful for input how to understand what might be going on here if it’s impossible to do something with the code snippets. Thank you!

I would try making

or one of its callers print its arguments (eg @show, @info), so that I could isolate a deterministic example.

I’ve put @show into an inner constructor of expression and it prints

new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))

before the error, so it goes back and forth between two values. I guess that’s a good hint already but I can’t yet narrow it down further. Is it a problem to use mutable structs as keys in a Dict? I wonder if that could have something to do with it. At least when I tested it briefly,

mutable struct some
    a::Float64
end

a = some(2.0)
b = some(2.0)
c = a
d = Dict(a => "stored value")
d[a] # works as wanted
d[b] # KeyError as wanted
d[c] # works as wanted
a.a = 3.0
d[a] # still works as wanted

If you happen to mutate them after inserting as keys, yes.

If your algorithm allows (immutable) structs, try to organize it with those. This may have performance benefits, and usually leads to much cleaner code.

Are you sure? I just read that the hash function for mutable structs operates on objectid, which should be fine as the address of the variables shouldn’t change, that’s kind of their “identity”

I thought that for mutable structs, the hash was based on memory address, not contents, in which case it wouldn’t matter if you change them.

Don’t rely on that, that’s an implementation detail. And you can get into trouble, eg

julia> mutable struct Foo
       a::Int
       end

julia> d = Dict()
Dict{Any,Any} with 0 entries

julia> a = Foo(1)
Foo(1)

julia> d[a] = "test"
"test"

julia> a.a = 4
4

julia> d
Dict{Any,Any} with 1 entry:
  Foo(4) => "test"

julia> d[a]
"test"

julia> d[Foo(4)]
ERROR: KeyError: key Foo(4) not found
Stacktrace:
 [1] getindex(::Dict{Any,Any}, ::Foo) at ./dict.jl:477
 [2] top-level scope at REPL[36]:1

The thing that still doesn’t make sense to me is, how can the error appear only sometimes if I have exactly the same testset every time? Many times it runs fine

What other data structure could I use in Julia? The c++ source uses unordered_map to keep track of many std::make_shared<float_variable>s. I thought the Dict could play the role of the unordered map and the mutable structs act as shared variables

I think your best option is figuring out a way that does not mutate keys.

I think I understand what caused the bug, not sure how to solve it exactly, yet. But I think it should explain why the error happens only sometimes.

I’ve overloaded the == operator for type variable and variable to construct a linear_expression. The linear expression creates a dict when it’s created and tries to add both variable to it. For that it uses haskey which calls ht_keyindex. In that function theres this part:

if !isslotmissing(h,index) && (key === keys[index] || isequal(key,keys[index]))
            return index
        end

and I think this then calls isequal but only sometimes, which probably depends on the memory address of the current object. Maybe with some hashes of memory addresses it needs to do the whole expression and with some it doesn’t.

isequal itself is defined as:

isequal(x, y) = x == y

so that gets me right back to where I started, “comparing” the two variables and creating a linear_expression from them.
So the solution should be to either avoid overloading == which would not be great as this makes for much more readable constraint creation. Or I overload isequal again to actually compare object identity so isequal(v::variable, v2::variable) = v === v2

aha the comments of Base.isequal say:

`isequal` is the comparison function used by hash tables (`Dict`). `isequal(x,y)` must imply
that `hash(x) == hash(y)`.

This typically means that types for which a custom `==` or `isequal` method exists must
implement a corresponding `hash` method (and vice versa). Collections typically implement
`isequal` by calling `isequal` recursively on all contents.

I may be off but what you wanted is not Base.IdDict?

1 Like

haha I wouldn’t have known to look for that, but seems like what I just implemented myself, only better probably. thanks

Sorry for arriving late, XD.