Very long stack trace

I am running a SciML type program called RUDE. The code generates an error, with a stack trace of almost 1000 lines long. I attach the trace herein. Perhaps somebody could give me sense of what the error is about? Perhaps an “out of memory error”? If so, perhaps the software responsible for the trace could be appropriately shorted? I have no suggestions on how to do it though. Thanks.Stack-Trace (1000 lines long)

This is an uncaught Enzyme error at the LLVM level. Could you share the program that generates this error as an issue for Enzyme.jl? That is what would be required to turn the LLVM assertion into a Julia-side error.

1 Like

Ok. On the GitHub repository I assume? Or here?

1 Like

Done.

I created a new project, and added the Zygote.jl package. Checking the toml files, I find that version 0.6.51 is loaded. Yet, the version is at 0.10.x . Why would this be? Thanks.

If you tell it to add the latest, what does it say?

BTW, forgot to post that the fix for your case is to just not let it test Enzyme at all and just add sensealg = ReverseDiffVJP() (or sensealg = ReverseDiffVJP(true)) to the solve call. But we should get this assertion fixed in Enzyme anyways.

I removed the callback argument, and ran the solve for five iterations. No errors.
My issues appear to be fixed at this time. Thanks @ChrisRackauckas for the help!
Have a nice Christmas!

Gordon

What did you run?

I ran the following (my loss now increases):

k = 1
loss_fn(θ) = loss_univ([θ; p_system], protocols[1:k], tspans[1:k], σ0, σ12_all, k)
cb_fun(θ, l) = callback(θ, l, protocols[1:k], tspans[1:k], σ0, σ12_all, k)
adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x,p)->loss_fn(x), adtype)
optprob = Optimization.OptimizationProblem(optf, θi) # get_parameter_values(nn_eqs)) # nn_eqs???
parameter_res = Optimization.solve(optprob, Optimisers.AMSGrad(), callback=cb_fun, sensealg = ReverseDiffVJP(true), maxiters=50)
global θi = result_univ.minimizer

All works now. I tried with and without the sensealg you suggested. All works now. But of course, I have the new Enzyme.jl package.
I need to figure out how to best use the output of the solve.

Cool, and Enzyme will release its patch and that should improve the errors.

Excellent! I am glad you were able to help.

Gordon

@ChrisRackauckas ,

I’ll continue the thread here. I was able to run Optimization.solve several hundred iterations before. But now, I get a strange error:

┌ Warning: dt(9.536743e-7) <= dtmin(9.536743e-7) at t=7.84356. Aborting. There is either an error in your model specification or the true solution is unstable.
└ @ SciMLBase ~/.julia/packages/SciMLBase/VKnrY/src/integrator_interface.jl:518
┌ Warning: Endpoints do not match. Return code: DtLessThanMin. Likely your time range is not a multiple of `saveat`. sol.t[end]: 7.84356, ts[end]: 12.0
└ @ SciMLSensitivity ~/.julia/packages/SciMLSensitivity/DInxI/src/concrete_solve.jl:1401
ERROR: DimensionMismatch: dimensions must match: a has dims (Base.OneTo(62),), b has dims (Base.OneTo(61),), mismatch at 1
Stacktrace:

If there were an error in my model specification, I could not have gotten results I have, so it must be an unstable solution. Given that I am using Tsit5, which is a variable time step (since it is in the RK family), what is happening?

What does Endpoints do not match mean? Could an instability make that happen although this mismatch has not happened before? And then there is a dimension mismatch. Again, how is that possible given that the code has been running quite a few iterations before this occurrence.

To be clear, I am referring to the line:

    parameter_res = Optimization.solve(optprob, Optimisers.AdamW(), callback=cb_fun, sensealg = ReverseDiffVJP(true), allow_f_increases=false, maxiters=100)

In any case, perhaps I just have to sleep on it, and experiment with other solvers, and options.

This means that at the given parameters it was unable to solve all of the way. Given the dtmin, this looks to be because it’s using Float32 numbers. Float32 is not going to be as capable of handling numerically difficult cases as Float64, so I’d recommend trying Float64 and seeing that’s fine.

Any simulation can go unstable depending on the parameters. Optimisers.AdamW is a stochastic optimizer, so you’ll get different parameters each time.

I figured out the problem. I have some function closures that refer to global variables. There is a section below in the code that overwrites these global variables and that created the problem. Thanks!