How to debug a non-deterministic StackOverflowError?

jules · May 29, 2019, 8:19am

I’m trying to implement the Rhea C++ constraint solving algorithm in Julia as a learning project. I’m simply translating the original source as far as I’m able into something that makes sense in Julia.

I have implemented part of the unit tests and at the current state they fail with a StackOverflowError, but not every time. As the inputs are completely deterministic, I don’t know how that can happen.

As the code is already a couple hundred lines long and I’m not sure where the problem might be, I know it will be complicated to get help here, because I can’t post a non-working minimal example.

So first of all here’s the stacktrace:

(Rhea) pkg> test Rhea
   Testing Rhea
 Resolving package versions...
Test Summary: | Pass  Total
strength test |   11     11
Test Summary: | Pass  Total
variable test |   16     16
Test Summary:        | Pass  Total
linear expressions 1 |   13     13
Test Summary:        | Pass  Total
linear expressions 3 |    5      5
linear equation test: Error During Test at /Users/juliuskrumbiegel/dev/julia/Rhea/test/runtests.jl:116
  Test threw exception
  Expression: is_satisfied(x == y - 1)
  StackOverflowError:
  Stacktrace:
   [1] Dict(::Pair{variable{Float64},Float64}) at ./dict.jl:124
   [2] expression{variable{Float64}}(::variable{Float64}, ::Float64, ::Float64) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:14
   [3] expression{variable{Float64}}(::variable{Float64}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:12
   [4] ==(::variable{Float64}, ::variable{Float64}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/constraint.jl:52
   [5] isequal(::variable{Float64}, ::variable{Float64}) at ./operators.jl:123
   [6] ht_keyindex(::Dict{variable{Float64},Float64}, ::variable{Float64}) at ./dict.jl:290
   [7] haskey(::Dict{variable{Float64},Float64}, ::variable{Float64}) at ./dict.jl:546
   [8] sub!(::expression{variable{Float64}}, ::Rhea.term{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:153
   [9] sub!(::expression{variable{Float64}}, ::expression{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/expression.jl:147
   [10] -(::expression{variable{Float64}}, ::expression{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/linear_expression.jl:43
   [11] ==(::expression{variable{Float64}}, ::expression{variable{Float64}}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/constraint.jl:45
   ... (the last 8 lines are repeated 9998 more times)
   [79996] ==(::variable{Float64}, ::variable{Float64}) at /Users/juliuskrumbiegel/dev/julia/Rhea/src/constraint.jl:52
ERROR: Package Rhea errored during testing

So there seems to be a loop between the ==(::variable{Float64}, ::variable{Float64}) statements at [4] and [11]. That line from [4] looks like this:

==(v::variable, v2::variable) = linear_expression(v) == linear_expression(v2)

which should then lead to another function that I defined

==(first::linear_expression, second::linear_expression) = constraint(first - second, eq)

but this doesn’t appear next in the stack trace, which I don’t understand.
variable is defined like this:

mutable struct variable{T}
    p::T
end

expression like this:

mutable struct expression{T}
    constant::Float64
    terms::Dict{T, Float64}
end

and linear_expression like this:

const linear_expression = expression{variable{Float64}}

The code has basically lots of redefinitions of operators, so that something like

x = variable(3.0)
y = variable(4.0)
x == y

should not compare x and y but create a linear expression.
I’ll leave it at that for now because it is a lot already, I would be grateful for input how to understand what might be going on here if it’s impossible to do something with the code snippets. Thank you!

Tamas_Papp · May 29, 2019, 8:38am

I would try making

or one of its callers print its arguments (eg @show, @info), so that I could isolate a deterministic example.

jules · May 29, 2019, 9:24am

I’ve put @show into an inner constructor of expression and it prints

new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(2.0)=>1.0))
new{T}(c, t) = expression{variable{Float64}}(0.0, Dict(variable{Float64}(3.0)=>1.0))

before the error, so it goes back and forth between two values. I guess that’s a good hint already but I can’t yet narrow it down further. Is it a problem to use mutable structs as keys in a Dict? I wonder if that could have something to do with it. At least when I tested it briefly,

mutable struct some
    a::Float64
end

a = some(2.0)
b = some(2.0)
c = a
d = Dict(a => "stored value")
d[a] # works as wanted
d[b] # KeyError as wanted
d[c] # works as wanted
a.a = 3.0
d[a] # still works as wanted

Tamas_Papp · May 29, 2019, 9:32am

If you happen to mutate them after inserting as keys, yes.

If your algorithm allows (immutable) structs, try to organize it with those. This may have performance benefits, and usually leads to much cleaner code.

jules · May 29, 2019, 9:34am

Are you sure? I just read that the hash function for mutable structs operates on objectid, which should be fine as the address of the variables shouldn’t change, that’s kind of their “identity”

e3c6 · May 29, 2019, 9:34am

I thought that for mutable structs, the hash was based on memory address, not contents, in which case it wouldn’t matter if you change them.

Tamas_Papp · May 29, 2019, 9:49am

Don’t rely on that, that’s an implementation detail. And you can get into trouble, eg

julia> mutable struct Foo
       a::Int
       end

julia> d = Dict()
Dict{Any,Any} with 0 entries

julia> a = Foo(1)
Foo(1)

julia> d[a] = "test"
"test"

julia> a.a = 4
4

julia> d
Dict{Any,Any} with 1 entry:
  Foo(4) => "test"

julia> d[a]
"test"

julia> d[Foo(4)]
ERROR: KeyError: key Foo(4) not found
Stacktrace:
 [1] getindex(::Dict{Any,Any}, ::Foo) at ./dict.jl:477
 [2] top-level scope at REPL[36]:1

jules · May 29, 2019, 10:27am

The thing that still doesn’t make sense to me is, how can the error appear only sometimes if I have exactly the same testset every time? Many times it runs fine

Tamas_Papp · May 29, 2019, 11:08am

jules · May 29, 2019, 12:18pm

What other data structure could I use in Julia? The c++ source uses unordered_map to keep track of many std::make_shared<float_variable>s. I thought the Dict could play the role of the unordered map and the mutable structs act as shared variables

Tamas_Papp · May 29, 2019, 12:27pm

I think your best option is figuring out a way that does not mutate keys.

jules · May 29, 2019, 12:42pm

I think I understand what caused the bug, not sure how to solve it exactly, yet. But I think it should explain why the error happens only sometimes.

I’ve overloaded the == operator for type variable and variable to construct a linear_expression. The linear expression creates a dict when it’s created and tries to add both variable to it. For that it uses haskey which calls ht_keyindex. In that function theres this part:

if !isslotmissing(h,index) && (key === keys[index] || isequal(key,keys[index]))
            return index
        end

and I think this then calls isequal but only sometimes, which probably depends on the memory address of the current object. Maybe with some hashes of memory addresses it needs to do the whole expression and with some it doesn’t.

isequal itself is defined as:

isequal(x, y) = x == y

so that gets me right back to where I started, “comparing” the two variables and creating a linear_expression from them.
So the solution should be to either avoid overloading == which would not be great as this makes for much more readable constraint creation. Or I overload isequal again to actually compare object identity so isequal(v::variable, v2::variable) = v === v2

jules · May 29, 2019, 12:47pm

aha the comments of Base.isequal say:

`isequal` is the comparison function used by hash tables (`Dict`). `isequal(x,y)` must imply
that `hash(x) == hash(y)`.

This typically means that types for which a custom `==` or `isequal` method exists must
implement a corresponding `hash` method (and vice versa). Collections typically implement
`isequal` by calling `isequal` recursively on all contents.

Henrique_Becker · May 29, 2019, 1:33pm

I may be off but what you wanted is not Base.IdDict?

jules · May 29, 2019, 1:38pm

haha I wouldn’t have known to look for that, but seems like what I just implemented myself, only better probably. thanks

Henrique_Becker · May 29, 2019, 2:09pm

Sorry for arriving late, XD.

Topic		Replies	Views
Is there a bug in `Dict`? General Usage bug , dictionary	17	2420	November 7, 2017
Overnight a ERROR: StackOverflowError: in unchanged code Optimization (Mathematical) jump	3	389	November 19, 2020
Dict equality with composite type value New to Julia question	9	575	October 27, 2019
REPL crashes with custom pretty printing and Dictionaries General Usage pretty-printing	8	268	September 23, 2022
Julia gets stuck on recursive function General Usage	11	765	March 31, 2019

How to debug a non-deterministic StackOverflowError?

Related topics