Very long stack trace

erlebach · December 22, 2022, 11:09pm

I am running a SciML type program called RUDE. The code generates an error, with a stack trace of almost 1000 lines long. I attach the trace herein. Perhaps somebody could give me sense of what the error is about? Perhaps an “out of memory error”? If so, perhaps the software responsible for the trace could be appropriately shorted? I have no suggestions on how to do it though. Thanks.Stack-Trace (1000 lines long)

ChrisRackauckas · December 22, 2022, 11:19pm

This is an uncaught Enzyme error at the LLVM level. Could you share the program that generates this error as an issue for Enzyme.jl? That is what would be required to turn the LLVM assertion into a Julia-side error.

erlebach · December 22, 2022, 11:28pm

Ok. On the GitHub repository I assume? Or here?

ChrisRackauckas · December 22, 2022, 11:29pm

erlebach · December 22, 2022, 11:44pm

Done.

erlebach · December 23, 2022, 2:56pm

I created a new project, and added the Zygote.jl package. Checking the toml files, I find that version 0.6.51 is loaded. Yet, the version is at 0.10.x . Why would this be? Thanks.

ChrisRackauckas · December 23, 2022, 3:32pm

If you tell it to add the latest, what does it say?

ChrisRackauckas · December 23, 2022, 3:33pm

BTW, forgot to post that the fix for your case is to just not let it test Enzyme at all and just add sensealg = ReverseDiffVJP() (or sensealg = ReverseDiffVJP(true)) to the solve call. But we should get this assertion fixed in Enzyme anyways.

erlebach · December 23, 2022, 4:20pm

I removed the callback argument, and ran the solve for five iterations. No errors.
My issues appear to be fixed at this time. Thanks @ChrisRackauckas for the help!
Have a nice Christmas!

Gordon

ChrisRackauckas · December 23, 2022, 4:36pm

What did you run?

erlebach · December 23, 2022, 4:45pm

I ran the following (my loss now increases):

k = 1
loss_fn(θ) = loss_univ([θ; p_system], protocols[1:k], tspans[1:k], σ0, σ12_all, k)
cb_fun(θ, l) = callback(θ, l, protocols[1:k], tspans[1:k], σ0, σ12_all, k)
adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x,p)->loss_fn(x), adtype)
optprob = Optimization.OptimizationProblem(optf, θi) # get_parameter_values(nn_eqs)) # nn_eqs???
parameter_res = Optimization.solve(optprob, Optimisers.AMSGrad(), callback=cb_fun, sensealg = ReverseDiffVJP(true), maxiters=50)
global θi = result_univ.minimizer

All works now. I tried with and without the sensealg you suggested. All works now. But of course, I have the new Enzyme.jl package.
I need to figure out how to best use the output of the solve.

ChrisRackauckas · December 23, 2022, 5:06pm

Cool, and Enzyme will release its patch and that should improve the errors.

erlebach · December 23, 2022, 5:22pm

Excellent! I am glad you were able to help.

Gordon

erlebach · December 23, 2022, 9:10pm

@ChrisRackauckas ,

I’ll continue the thread here. I was able to run Optimization.solve several hundred iterations before. But now, I get a strange error:

┌ Warning: dt(9.536743e-7) <= dtmin(9.536743e-7) at t=7.84356. Aborting. There is either an error in your model specification or the true solution is unstable.
└ @ SciMLBase ~/.julia/packages/SciMLBase/VKnrY/src/integrator_interface.jl:518
┌ Warning: Endpoints do not match. Return code: DtLessThanMin. Likely your time range is not a multiple of `saveat`. sol.t[end]: 7.84356, ts[end]: 12.0
└ @ SciMLSensitivity ~/.julia/packages/SciMLSensitivity/DInxI/src/concrete_solve.jl:1401
ERROR: DimensionMismatch: dimensions must match: a has dims (Base.OneTo(62),), b has dims (Base.OneTo(61),), mismatch at 1
Stacktrace:

If there were an error in my model specification, I could not have gotten results I have, so it must be an unstable solution. Given that I am using Tsit5, which is a variable time step (since it is in the RK family), what is happening?

What does Endpoints do not match mean? Could an instability make that happen although this mismatch has not happened before? And then there is a dimension mismatch. Again, how is that possible given that the code has been running quite a few iterations before this occurrence.

To be clear, I am referring to the line:

    parameter_res = Optimization.solve(optprob, Optimisers.AdamW(), callback=cb_fun, sensealg = ReverseDiffVJP(true), allow_f_increases=false, maxiters=100)

In any case, perhaps I just have to sleep on it, and experiment with other solvers, and options.

ChrisRackauckas · December 23, 2022, 9:54pm

This means that at the given parameters it was unable to solve all of the way. Given the dtmin, this looks to be because it’s using Float32 numbers. Float32 is not going to be as capable of handling numerically difficult cases as Float64, so I’d recommend trying Float64 and seeing that’s fine.

Any simulation can go unstable depending on the parameters. Optimisers.AdamW is a stochastic optimizer, so you’ll get different parameters each time.

erlebach · December 23, 2022, 11:01pm

I figured out the problem. I have some function closures that refer to global variables. There is a section below in the code that overwrites these global variables and that created the problem. Thanks!

Topic		Replies	Views
Reliability of Enzyme.jl General Usage question , diffeq , autodiff	11	2068	October 23, 2022
Optimization crashes Julia 1.8 Optimization (Mathematical) diffeq	1	481	August 20, 2022
Understanding an Enzyme Warning Message Optimization (Mathematical) sciml , enzyme	1	549	August 15, 2023
Enzyme ready for everyday use? (2024) General Usage autodiff , enzyme	16	1459	August 31, 2024
Lux + Enzyme and Zygote + NeuralODE, segmentation fault Machine Learning machine-learning , differentialequation , enzyme , lux	6	311	March 3, 2025

Very long stack trace

Related topics