Error relating to Zygote

erlebach · February 23, 2023, 8:15pm

I am getting the following error (which I did not use to get):

┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs

What generates this error? Does it basically mean that the source code cannot be differentiated? How can this be debugged when using a UODE? Thanks.

As a result, the code has slowed to a crawl as a result. Unfortunately, the code is too complex to show here, and if I simplify the code, it is likely the error will go away. If I could have a simple example that demonstrates when the error happens, this might help me.
For reference, the error occurs in the following location (there is no trace back to my code):

┌ Warning: FastChain is being deprecated in favor of Lux.jl. Lux.jl uses functions with explicit parameters f(u,p) like FastChain, but is fully featured and documented machine learning library. See the Lux.jl documentation for more details.
└ @ DiffEqFlux ~/.julia/packages/DiffEqFlux/jHIee/src/fast_layers.jl:9
┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs
└ @ SciMLSensitivity ~/.julia/packages/SciMLSensitivity/6YVpi/src/concrete_solve.jl:115
callback_static, iter: 1, loss: 67.55578186035156
┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs
└ @ SciMLSensitivity ~/.julia/packages/SciMLSensitivity/6YVpi/src/concrete_solve.jl:115
callback_static, iter: 2, loss: 44.50825299438902
┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs
└ @ SciMLSensitivity ~/.julia/packages/SciMLSensitivity/6YVpi/src/concrete_solve.jl:115

The error likely occurs in the following code section (I doubt this will help you, but just in case …):

        for k = range(start_at, length(protocols), step=1)
            # Loss function closure
            loss_fn(θ) = loss_univ([θ; p_system], protocols[1:k], tspans[1:k], σ0, σ12_all, k, dct)
            # Callback function closure
            # k are the trajectories (1:8) 
            cb_fun(θ, loss) = callback(θ, loss, protocols[1:k], tspans[1:k], σ0, σ12_all, k, iter)
            adtype = Optimization.AutoZygote()
            optf = Optimization.OptimizationFunction((x,p)->loss_fn(x),  adtype)
            optprob = Optimization.OptimizationProblem(optf, θi)  # get_parameter_values(nn_eqs)) # nn_eqs???
            parameter_res = Optimization.solve(optprob, Optimisers.AdamW(), callback=cb_fun, sensealg = ReverseDiffVJP(true), allow_f_increases=false, maxiters=dct[:maxiters])
            θi = parameter_res.u
            push!(out_files, "tbnn_k=" * string(k))
            #@save "tbnn.bson" θi
            @save out_files[end] θi
        end

Any help is appreciated.

erlebach · February 24, 2023, 8:58pm

@ChrisRackauckas

I worked on simplifying my code to identify the provenance of the warning discussed in my original post, namely,

┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs

Among other things, I set to zero the return values of differential equations and neural networks until the warning disappeared and identified the line when it appeared. Here is a code snippet:

function xxx(...)
    # Run the integrity basis through a neural network
    model_inputs = [λ1;λ2;λ3;λ4;λ5;λ6;λ7;λ8;λ9]
    g1,g2,g3,g4,g5,g6,g7,g8,g9 = model_univ(model_inputs, model_weights)  # model_univ not found
    return 0., 0., 0., 0., 0., 0.   # Generates warning. 
end

Here is what is interesting. As written, I get the warning. However, if I comment out the line

    g1,g2,g3,g4,g5,g6,g7,g8,g9 = model_univ(model_inputs, model_weights)

the warning error disappears. How is that possible? The output of the neural network should not be taken into account when the adjoint algorithm does its calculations.
Any help is appreciated. Thanks!

ChrisRackauckas · February 25, 2023, 3:31pm

It would be due to the definition of your ODE doing something that’s not compatible with automatic differentiation. I would need to see how that function is defined to help.

erlebach · February 25, 2023, 3:54pm

Hi Chris,

I came to the same conclusion. I simplified the ODE by simply returning an array of zeros.
I can certainly share, but the code is long. I have been trying an MWE, but I am getting weird results. I could provide the entire code in an attachment, but that seems unproductive.
I will continue and try to work this out and post any findings. Thanks for replying.

Is it possible, like in C++, that some kind of memory error would effect some other part of the code? Unlikely given the array bounds check, but still, I have to ask.

Sent with Right Inbox

erlebach · February 25, 2023, 5:24pm

@ChrisRackauckas

Before providing the source (I don’t think you want the full source ), here is a simple experiment:

for o in dct_params[:ω0]
    for g in dct_params[:γ0]
        global run += 1
        dct[:datetime] = now()
        dct[:run] = run
        #dct[:ω] = o   # changed on 2023-02-25_11:12. Will warnings appear. Yes. They appear (if both dct[:omega] and dct[:T] are changed at the same time
        #dct[:ω0] = o
        dct[:γ] = g
        #dct[:T] = (2. * π / o * dct[:Ncycles]  # changed on 2023-02-25_11:12. Will warnings appear. (change denominator to o, and comment dct[:omega]
        #dct[:T] = (2. * π / dct[:ω0]) * dct[:Ncycles]
        figure = single_run(dct)
        push!(dicts, deepcopy(dct))
        fig_file_name = "plot_" * "run$(dct[:run]).pdf"
        savefig(figure, fig_file_name)
    end
end

This is a main loop over simulations. I store the simulation parameters in a dictionary, which are stored in a list of dictionaries, which will be converted to a data frame and saved for future analysis. Notice the two lines:

        #dct[:ω0] = o
        #dct[:T] = (2. * π / o * dct[:Ncycles]

These dictionary values are NOT used in the function single_run or anywhere else in my code. If the lines are commented out, I DO NOT get the warning error we are discussing. If the lines are left in the code, I get the warning error. In C++, this would indicate a memory error that affects some other part of the code. I did not think that was possible in Julia since the language does not deal with pointers.

I could use NamedTuples, but then how to I assign different values stored in dct_params loop, which loops over parameters of the problems. I would have to modify a non-mutable construct (NamedTuples).

ChrisRackauckas · February 25, 2023, 6:25pm

Is that in the ODE definition?

erlebach · February 25, 2023, 6:28pm

It is not in the ODE definition. It is my main script. At the top of the script, I have some include statements;
The full top-level script:

using DiffEqFlux, Flux, Optim, DifferentialEquations, LinearAlgebra, OrdinaryDiffEq, DelimitedFiles
using Optimization, OptimizationOptimisers, OptimizationOptimJL
using Zygote
using Plots
using DataFrames, CSV, YAML
using BSON: @save, @load
using Dates
#using PyPlot

# Need a global function
include("rude_functions.jl")
include("rude_impl.jl")

dct_params = Dict()
# There are 8 protocols for each value of ω0 and γ0. So if you have 4 pairs (ω0, γ0),
# you will have 8 protocols for each pair. (That is not what Sachin's problem proposes).
# So perhaps one has to generalize how the protocols are stored in the main program to run
# a wider variety of tests. Ask questions, Alex.
dct_params[:ω0] = [0.1, 1.0]
dct_params[:ω0] = [0.1]
dct_params[:γ0] = [0.1]
dct_params[:ω0] = [1.0] # original rude.jl
dct_params[:γ0] = [1.0] # original rude.jl

# Create a Dictionary with all parameters
dct = Dict()
# Set to lower number to run faster and perhaps get less accurate results
dct[:maxiters] = 200  # 200 was number in original code
# Set to 12 once the program works. More cycles, more oscillations
dct[:Ncycles] = 12.  # set to 1 while debugging
dct[:ω] = 1f0
# set to 1 to run faster. Set to 8 for more accurate results
dct[:nb_protocols] = 8
dct[:skip_factor] = 50
dct[:dct_giesekus] = Dict()
gie = dct[:dct_giesekus]
gie[:η0] = 1
gie[:α] = 0.8 # a change to this value propagates correctly to dct[:dct_giesekus][α]
gie[:τ] = 1.0
# Pay attention to references (Julia's version of pointers, but not a memory address). I am working with dictionaries of dictionaries
print(dct[:dct_giesekus][:α])
println(keys(dct)|>collect)

dct[:dct_NN] = Dict()
dNN = dct[:dct_NN]
dNN[:nb_hid_layers] = 2
dNN[:in_layer] = 9
dNN[:out_layer] = 9
dNN[:hid_layer] = 32
dNN[:act] = tanh
# ============== END DICTIONARY DEFINITIONS ==================

# Not a good idea to let global variables lying around
# Write dictionary to a database
dicts = []
run=0
for o in dct_params[:ω0]
    for g in dct_params[:γ0]
        global run += 1
        dct[:datetime] = now()
        dct[:run] = run
        #dct[:ω] = o   # changed on 2023-02-25_11:12. Will warnings appear. Yes. They appear (if both dct[:omega] and dct[:T] are changed at the same time
        #dct[:ω0] = o
        dct[:γ] = g
        #dct[:T] = (2. * π / o * dct[:Ncycles]  # changed on 2023-02-25_11:12. Will warnings appear. (change denominator to o, and comment dct[:omega]
        #dct[:T] = (2. * π / dct[:ω0]) * dct[:Ncycles]
        figure = single_run(dct)
        # deepcopy will make sure that the results is different than dct
        # Without it, the dictionaries saved in the list will all be the same
        push!(dicts, deepcopy(dct))
        fig_file_name = "plot_" * "run$(dct[:run]).pdf"
        savefig(figure, fig_file_name)
    end
end

I downloaded code (rude.jl) from github, and after a few changes, was able to run it. But it was in a single file. To allow running several cases overnight, I wanted to avoid the cost of starting up Julia, so I did what I do in Python: create a high-level script, put variables in dictionaries (I have similar issues with NamedTuples), and call a function single_run, passing global variables to it through a dictionary. Yes, dicitonaries are inefficient, but they are not used in the sections of the code that are time-consuming.

I am solving a UODE, where the neural network is a tensorbasis network (the NN generates the coefficients of the tensor basis, and the summation is a term in the differential equation. As I wrote in a previous post, in a different section of the code, I have the lines:

function xxx(...)
    # Run the integrity basis through a neural network
    model_inputs = [λ1;λ2;λ3;λ4;λ5;λ6;λ7;λ8;λ9]
    g1,g2,g3,g4,g5,g6,g7,g8,g9 = model_univ(model_inputs, model_weights)  # model_univ not found
    return 0., 0., 0., 0., 0., 0.   # Generates warning. 
end

If I comment out the call the neural network model_univ, the warning error disappears. But with the call, the warning error is present. Even if there are errors in the neural network, how can that impact anything, given that g1,g2,g3... are not used?

If the network is using dual numbers, then of course, the error is the neural model, which is defined as follows using Flux (even though Lux is recommended, Flux should work.):

    model_univ = FastChain(FastDense(dNN[:in_layer], hid, act),
                        FastDense(hid, hid, act),
                        FastDense(hid, dNN[:out_layer]))
    dct[:model_univ] = model_univ

You can see how the dictionary is used.

erlebach · February 25, 2023, 11:01pm

@ChrisRackauckas,

I fixed the error. It had nothing to do with any of my solvers. Instead, it had to do with inconsistencies of my parameters. Some were in Float32, and some in Float64. I was not paying attention to 1.f0 versus 1.0 . In my estimation, there are rules missing in the built-in derivatives for the adjoint or derivative calculations. Unless this issue is by design. But in that case, it would be nice to be told that there is type inconsistency.

Cheers,

Gordon

ChrisRackauckas · February 26, 2023, 1:39am

Huh it should’ve said exactly that in the stacktrace. Can you share what the full thing was?

erlebach · February 26, 2023, 2:17am

There was no stack trace. Just the warning error I listed in the previous message:

┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs

with the result that the code ran more than 100x slower. Perhaps slower than that.

ChrisRackauckas · February 26, 2023, 2:39am

See if this would have helped you:

I wonder if it’s too noisy of a solution to always throw the vjp trial stack traces by default. But then again, if we tell people how to turn it off then it’s likely okay. Otherwise the only way to debug this is to forceibly do sensealg = InterpolatingAdjoint(autojacvec = ReverseDiffVJP()) and then you’d get the stack trace, but it makes it much harder to handle the automated algorithm failures.

ChrisRackauckas · February 26, 2023, 2:51am


function sum_of_solution(u0, p)
    _prob = remake(prob, u0 = u0, p = p)
    sum(solve(_prob, Tsit5(), reltol = 1e-6, abstol = 1e-6, saveat = 0.1))
end

du01, dp1 = Zygote.gradient(sum_of_solution, u0, p)

┌ Warning: EnzymeVJP tried and failed in the automated AD choice algorithm with the following error. (To turn off this printing, add `verbose = false` to the `solve` call)
└ @ SciMLSensitivity ~/.julia/dev/SciMLSensitivity/src/concrete_solve.jl:19
ErrorException("Return type inferred to be Union{}. Giving up.")
┌ Warning: ReverseDiffVJP tried and failed in the automated AD choice algorithm with the following error. (To turn off this printing, add `verbose = false` to the `solve` call)
└ @ SciMLSensitivity ~/.julia/dev/SciMLSensitivity/src/concrete_solve.jl:50
MethodError(*, ([-2.0], [1.5, 1.0, 3.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 0x00000000000083ac)
┌ Warning: Reverse-Mode AD VJP choices all failed. Falling back to numerical VJPs
└ @ SciMLSensitivity ~/.julia/dev/SciMLSensitivity/src/concrete_solve.jl:135

erlebach · February 26, 2023, 3:27am

Thank you, Chris. I will try this out. More new stuff to learn! It is never-ending.

Gordon

Sent with Right Inbox

ChrisRackauckas · February 26, 2023, 9:40am

I’ll push a bunch of stack trace improvements in a bit. I have 8 PRs lined up because I had a flight where the wifi was broken… Otherwise they would be up by now.

erlebach · February 26, 2023, 1:33pm

I updated the libraries and ran the code. All well. I interrupted the code, and got a long stack related to pullbacks (I must have interrupted it during an adjoint computation). So all that is left to do is recreate my previous issue and see what happens.

Sent with Right Inbox

ChrisRackauckas · February 26, 2023, 4:24pm

The new stacktrace releases weren’t out. I just triggered them now. In about an hour if you update you should get them and it should print some much more informative stuff.

Topic		Replies	Views
Ambiguous method bug in ReverseDiff General Usage	6	192	May 9, 2023
Zygote and SDE failure: cannot differentiate through Lux NN Machine Learning diffeq , flux , sde , zygote , autodiff	5	636	July 14, 2022
DiffEqGPU.jl with CUDA: Error computing gradients through SDE solver Machine Learning	12	284	January 10, 2024
Reverse-Mode AD VJP choices all failed Performance	3	255	February 20, 2023
Errors when running a Universal Differential Equation (UDE) New to Julia ode , optimization , ad , lux	8	300	February 8, 2025

Error relating to Zygote

Related topics