Why is displaying a stack trace with values difficult

Inspecting the stack immediately after an error occurred would be fantastic for debugging… Right now, GitHub - JuliaDebug/Debugger.jl: Julia debugger can do it, but it’s slow. And of course @Keno of all people gave it a shot with GitHub - JuliaDebug/Gallium.jl: The Julia debugger and threw in the towel, so I have to assume that it is very difficult, but I still don’t understand why, and I’d like to.

Julia already provides backtrace and stacktrace, which parse the stack and retrieve the stack frames. StackFrame.pointer points to that frame’s data, and StackFrame.linfo is that frame’s MethodInstance.

How far is that from being able to recover the passed arguments? I understand that with Julia’s aggressive inlining a lot is lost/optimized away, but nevertheless many frames (eg. at least dynamic dispatch calls) should be recoverable, if only from the calling convention.

If LLVM is poor at emitting debug information/variable location for some function f, couldn’t you create the stack frame for f with some flag value 1.23456 and see where that bit pattern ends up? Does that even make sense?

8 Likes

It seems to me that the difficulty is that extracting variable information is a deep dive into assembly emitted by LLVM where Julia does not have direct control, and navigating among arbitrary assembly code is dire without debug information. So basically Julia knows nothing until the function returns a usable Julia value.

Really practical way of doing things, but implementing this strategy without modifying the context is equally challenging: how do you perform experiments with print?

The problem as I understand it is when displaying a stack trace, finding actual values of arguments somewhere on the stack is hard. They may or may not actually be on the stack in a decipherable form. Let’s look at the function I defined in a PR yesterday:

function invmod(n::T) where {T<:BitInteger}
    isodd(n) || throw(DomainError(n, "Argument must be odd."))
    x = (3*n ⊻ 2) % T
    y = (1 - n*x) % T
    for _ = 1:trailing_zeros(2*sizeof(T))
        x *= y + true
        y *= y
    end
    return x
end

Suppose this is called in a context where we don’t care about n after the call to invmod(n) and the stack trace is thrown from somewhere after x and y have been initialized. At that point, there’s no reason to have n in memory anywhere—we no longer need it. So how can you recover the value of n to display it? The DWARF standard for debugging info is supposed to help address this, but it ends up being complex and brittle and it turns out that the JIT compiler that we use doesn’t really even try to emit DWARF code correctly.

Another thing to consider: what if you’ve modified an array argument by the time a stack trace needs to be shown? You can’t very well copy every array argument on the way in; so you’d end up displaying the modified contents of the array, which might end up being more confusing than helpful. Maybe that’s ok but it’s certainly a sharp edge that people would have to learn if we displayed values in stack traces.

The above example does suggest an idea to me though. If we forget about being in a debugger where you can set a breakpoint at an arbitrary line of code, then we could maybe improve this. Outside of a debugger we can only get a stack trace at an error point, so if we saved all arguments until error branches in functions, then we could display values in stack traces.

4 Likes

That’s also true of all debuggers out there, whether Python’s or Debugger.jl. Maybe a stacktrace with values would be misleading; I was referring more to the ability to inspect the stack like Debugger.jl does.

I think it’s OK to not have all of the arguments visible? Common Lisp’s debugger often showed <value unknown> (depending on optimization/debugger-info levels). IIRC it sometimes even had garbage values, though that’s clearly not great.

Thank you for the explanation!