Problem with Zygote gradient

Hi all,

I work with Julia 1.7. Zygote fails to generate/compute my AD gradient in the code below (the loss function itself works well). The message is the following :

ERROR: LoadError: Need an adjoint for constructor DiffEqArray{Float64, 2, Vector{Vector{Float64}}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Vector{Float64}}. Gradient is of type VectorOfArray{Float64, 2, Vector{Vector{Float64}}}

I guess this has something to do with the selection of states in the loss function, but since the loss function works OK there should be another way of doing things differently ?

Thanks for help,

S.

function syst_AM2(state,theta,t) 
    Y_x1,Y_x2,mu1_max,mu2_max,Ks1,Ks2,KI,mu0 = theta;
    X1,X2,S0,S1,S2,CH4 = state
    r0=mu0*S0;
    r1=mu1_max*S1*X1/(S1+Ks1);
    r2=mu2_max*S2*X2/(S2+Ks2+(S2^2/KI));
    [Y_x1*r1, Y_x2*r2, -r0, -r1+r0, (1-Y_x1)*r1-r2, (1-Y_x2)*r2]
end

function loss(theta)
    prob = ODEProblem(syst_AM2,state0,(t_CH4[1],t_CH4[end]),theta)
    sol = solve(prob, Rodas5())
    state = sol(t_CH4)
    S2 = state[5,:]
    CH4 = state[6,:]
    deb_CH4 = CH4[2:end]-CH4[1:end-1]
    e1 = CH4[1:end-1]-Exp_CH4
    e2 = deb_CH4-Exp_deb_CH4
    e3 = S2[1:end-1]-Exp_AGV
    sum(e1.^2)+sum(e2.^2)+sum(e3.^2)
end

using Plots, DifferentialEquations, DiffEqSensitivity, Zygote

H=0.39;   
R=0.195; 
Surf=pi*R^2; 
Exp_CH4 = [0,0.146695751,0.452562719,0.85245185,1.390508363,1.88574014,2.333176518,2.79329662,3.417909766,4.299654989,6.515952412,10.29002283,13.99900971,16.46767139,18.77271153,21.23159231,23.82660177,26.81067853,30.00135498,32.59923757,34.18509462,35.5158513,36.74890559,37.86754592,38.79872368,39.69511074,40.56481051,41.3865614,42.13182338,42.84323204,43.49045536];

t_CH4=0.0:size(Exp_CH4)[1]; # temps (jours)
Exp_AGV = [0.6307493,1.8081079,3.0002554,3.4371393,4.1710245,4.306228,4.4414315,4.576635,5.0225553,5.2529314,4.6852442,3.6354049,3.1327115,2.6300181,2.1273247,1.9948686,1.8574543,1.5407059,1.2099629,1.1136956,1.0174283,0.9211824,0.9121245,0.9195297,0.8298519,0.9236508,0.9041827,0.8846932,0.8652251,0.8372571,0.808326];

Exp_deb_CH4 = 0*Exp_CH4;
Exp_deb_CH4[2:end]= Exp_CH4[2:end]-Exp_CH4[1:end-1];

state0=[0.283861214/(H*Surf), 0.094620405/(H*Surf), 5.180333444/(H*Surf), 0/(H*Surf), 0.029385961/(H*Surf), 0]; 
theta_c0 = [0.25, 0.43, 0.5, 3.1, 21, 23, 4, 0.08]


loss(theta_c0)
Zygote.gradient(loss,theta_c0)

That’s not the error it should be throwing :sweat_smile: . I’ll look into that. What it should be telling you is that sol(t_CH4) post-solution interpolation is not compatible with adjoint techniques. The solution here is… well, you don’t need to do that. Just do state = solve(prob, Rodas5(),saveat=t_CH4) and you’re good.

Hi Chris,

Thank you. What package would you use then to fit the data (now that the loss function is OK for AD) ?

S.

Depends on what you’re doing, but DiffEqFlux is the package with all of the fitting utilities.

https://diffeqflux.sciml.ai/dev/

I generally recommend using the multiple shooting tooling.

https://diffeqflux.sciml.ai/dev/examples/multiple_shooting/

OK I will experiment DiffEqFlux. Btw does the “multiple shooting” has something to do with the multiple shooting method for solving direct+adjoint system in optimal control ?

S.

Yes, it’s the same thing. Optimal control is just a parameter estimation / inverse problem, so it’s the same deal. The “adjoint system” is the same thing as how reverse-mode AD works: it generates an adjoint system code and then solves it for the gradient.

Here’s optimal control in the docs:

https://diffeqflux.sciml.ai/dev/examples/optimal_control/

Ok thanks. But I have another problem: bounds are not honored by sciml_train, updated script follows and error is thrown at the end of optimization:

4.455452613111223
4.454489350679541
4.453604660656758
4.452731382256159
4.452731382256159
ERROR: LoadError: ArgumentError: Initial x[(8,)]=0.07975926992249756 is outside of [0.04, 0.04]

S.

function syst_AM2(state,theta,t) 
    Y_x1,Y_x2,mu1_max,mu2_max,Ks1,Ks2,KI,mu0,S00,S10,S20,coeff = theta;
    X1,X2,S0,S1,S2,CH4 = state
    r0=mu0*S0;
    r1=mu1_max*S1*X1/(S1+Ks1);
    r2=mu2_max*S2*X2/(S2+Ks2+(S2^2/KI));
    [Y_x1*r1, Y_x2*r2, -r0, -r1+r0, (1-Y_x1)*r1-r2, (1-Y_x2)*r2]
end

function loss(theta)
    state0 = [X10;X20;theta[9:11];0];
    coeff = theta[12];
    prob = ODEProblem(syst_AM2,state0,(t_CH4[1],t_CH4[end]),theta)
    state = solve(prob, Rodas5(),saveat=t_CH4)
    S2 = state[5,:]
    CH4 = state[6,:]
    deb_CH4 = CH4[2:end]-CH4[1:end-1]
    e1 = CH4[1:end-1]-Exp_CH4*1
    e2 = (deb_CH4-Exp_deb_CH4)*4
    e3 = (S2[1:end-1]-coeff*Exp_AGV)*1
    (sum(e1.^2)+sum(e2.^2)+sum(e3.^2))/length(t_CH4);
end

function mycallback(p, l, pred)
    display(l)
    return false
end

using Plots, DifferentialEquations, DiffEqSensitivity, Zygote,  DiffEqFlux

H=0.39;   
R=0.195; 
Surf=pi*R^2; 
Exp_CH4 = [0,0.146695751,0.452562719,0.85245185,1.390508363,1.88574014,2.333176518,2.79329662,3.417909766,4.299654989,6.515952412,10.29002283,13.99900971,16.46767139,18.77271153,21.23159231,23.82660177,26.81067853,30.00135498,32.59923757,34.18509462,35.5158513,36.74890559,37.86754592,38.79872368,39.69511074,40.56481051,41.3865614,42.13182338,42.84323204,43.49045536];

t_CH4=0.0:size(Exp_CH4)[1]; # temps (jours)
Exp_AGV = [0.6307493,1.8081079,3.0002554,3.4371393,4.1710245,4.306228,4.4414315,4.576635,5.0225553,5.2529314,4.6852442,3.6354049,3.1327115,2.6300181,2.1273247,1.9948686,1.8574543,1.5407059,1.2099629,1.1136956,1.0174283,0.9211824,0.9121245,0.9195297,0.8298519,0.9236508,0.9041827,0.8846932,0.8652251,0.8372571,0.808326];

Exp_deb_CH4 = 0*Exp_CH4;
Exp_deb_CH4[2:end]= Exp_CH4[2:end]-Exp_CH4[1:end-1];

X10 = 0.283861214/(H*Surf);   
X20 = 0.094620405/(H*Surf);
S00 = 5.180333444/(H*Surf);
S10 = 0/(H*Surf); 
S20 = 0.029385961/(H*Surf); 

theta_c0 = [0.25, 0.43, 0.5, 3.1, 21.0, 23.0, 4.0, 0.04, S00, S10, S20, 1.0]

cstr = [
0 1;    #Y_X1
0 1;    #Y_X2
0 Inf; #mu1max
0 Inf; #mu2max
1e-3 Inf; #Ks1
1e-3 Inf; #Ks2
1e-3 Inf; #KI
0.04  0.04; # mu0
0  Inf; # S00
0  Inf; # S10
0  Inf; # S20
0  Inf]; # AGV coefficient


result = DiffEqFlux.sciml_train(loss, theta_c0, cb=mycallback, lower_bounds=cstr[:,1],upper_bounds=cstr[:,2])

Which optimizer? Flux optimizers will ignore them.

I tried to use directly GalacticOptim,jl I replaced

result = DiffEqFlux.sciml_train(loss, theta_c0, cb=mycallback, lower_bounds=cstr[:,1],upper_bounds=cstr[:,2])

by

fun = OptimizationFunction(loss, GalacticOptim.AutoForwardDiff());
prob = GalacticOptim.OptimizationProblem(fun, theta_c0,lcons = cstr[:,1], ucons = cstr[:,2]);
sol = solve(prob, IPNewton())

but I have the following error:

ERROR: LoadError: Use OptimizationFunction to pass the derivatives or automatically generate them with one of the autodiff backends

S.

IIRC This got an issue and was solved?

yes !.

S.