Enzyme modifies my objective function after computing a gradient?

Hi,

I’m trying to use Enzyme.jl to compute gradients for a nonlinear least squares objective function I’d like to optimize for data assimilation (to my understanding this is equivalent to what is known as weak 4DVar). The objective function discretizes a differential equation using Simpson-Hermite quadrature, and is defined in the following code:

module SimpsonHermite

@inline function simpson_residual(xi, xmi, xip, p, fi, fmi, fip, dt, vf, Rf)
    # vf = vf(u,p,force) is vector field
    # dt = discretization timestep
    # xi = state vector at time t_i
    # xmi = state vector at time t_{i + 1/2}
    # xip = state vector at time t_{i+1}
    # p = parameter vector
    # fi = forcing (time-dependent parameters) vector (or scalar) at time t_i
    # fmi = forcing at time t_{i + 1/2}
    # fip = forcing at time t_{i+1}
    # Rf = vector of weights for each coordinate of the residual
   
    Rf .* (xip - xi - (dt/6)*(vf(xi,p,fi) + 4*vf(xmi,p,fmi) + vf(xip,p,fip)))
end

@inline function hermite_residual(xi, xmi, xip, p, fi, fmi, fip, dt, vf, Rf)
    Rf .* (xmi - ((xi + xip)/2 + (dt/8)*(vf(xi,p,fi) - vf(xip,p,fip))))
end

function gen_simpson_hermite_objective(vf, dt, frec, data, Rf, D, P)
    #dt should have length N and contains all the timesteps
    #frec should have length (2N+1)*F and has all the driving forces
    #data should have length (2N+1)*L and has all the observed data
    #F = num driving forces, L = num observed dims
    
    N = length(dt);
    F = length(frec) ÷ (2N+1)
    L = length(data) ÷ (2N+1)
    
    function obj(x)
        @views begin
            p = x[(2N+1)*D+1:(2N+1)*D+P]
        
            obj_model = 0.0
            obj_data = 0.0
        
            for i=1:N
                
                #squared Simpson residuals
                obj_model += sum(y^2 for y in simpson_residual(x[2D*(i-1)+1:2D*(i-1)+D], x[2D*(i-1)+D+1:2D*(i-1)+2D], x[2D*(i-1)+2D+1:2D*(i-1)+3D], 
                            p, frec[2F*(i-1)+1:2F*(i-1)+F], frec[2F*(i-1)+F+1:2F*(i-1)+2F], frec[2F*(i-1)+2F+1:2F*(i-1)+3F], dt[i], vf, Rf))
                
                #squared Hermite residuals
                obj_model += sum(y^2 for y in hermite_residual(x[2D*(i-1)+1:2D*(i-1)+D], x[2D*(i-1)+D+1:2D*(i-1)+2D], x[2D*(i-1)+2D+1:2D*(i-1)+3D], 
                            p, frec[2F*(i-1)+1:2F*(i-1)+F], frec[2F*(i-1)+F+1:2F*(i-1)+2F], frec[2F*(i-1)+2F+1:2F*(i-1)+3F], dt[i], vf, Rf))
                
                obj_data += sum(y^2 for y in (x[2D*(i-1)+1:2D*(i-1)+L] - data[2L*(i-1)+1:2L*(i-1)+L]))
                obj_data += sum(y^2 for y in (x[2D*(i-1)+D+1:2D*(i-1)+D+L] - data[2L*(i-1)+L+1:2L*(i-1)+2L]))

            end
            
            obj_data += sum(y^2 for y in (x[2*N*D+1:2*N*D+L] - data[2*N*L+1:2*N*L+L]))
        end
        
        return obj_data/((2*N+1)*L) + obj_model/((2*N+1)*D)
    end
end
            
end

I then generate data from the Lorenz96 model, use this to build the objective, and compute the objective at a random point within box constraints of interest:

u0 = [rand()*(ub[i] - lb[i]) + lb[i] for i=1:length(lb)]; #random initialization for first step
step_obj = SimpsonHermite.gen_simpson_hermite_objective(lor96, dt, frec, data, Rf, D, P) #frec, data, Rf, D, and P are all initialized to some constant values based on the generated Lorenz96 data

step_obj(u0) #returns ~862.68

Now, I compute the gradient using Enzyme:

u0_enz = copy(u0);
bu0_enz = zeros(length(u0));
Enzyme.autodiff(Reverse, step_obj, Duplicated(u0_enz, bu0_enz))

Surprisingly, if I now recompute the objective at u0 by calling step_obj, I receive a different result!

step_obj(u0) # now returns 9.241999372228449e9

Each time I run the code block with the Enzyme autodiff call, the step_obj seems to be modified once more, and eventually returns Inf or NaN results. Regenerating the objective function by rerunning the line step_obj = SimpsonHermite.gen_simpson_hermite_objective(...) does not fix this problem; only by re-including the SimpsonHermite module, and then rerunning the objective generation function, does the objective go back to returning correct values.

This is very strange, and I’m not sure if it’s a bug or if I’m doing something incorrect. Any help would be greatly appreciated!

Possibly bad form to revive this, and even worse form to summon a user, but I am quite curious as to what’s going on here (and also would really like to understand how to properly use Enzyme) so @wsmoses, would you be able to explain this issue?

I believe this is similar to the issue Enzyme is modifying variables in a struct that is not part of active data · Issue #640 · EnzymeAD/Enzyme.jl · GitHub.
Basically it looks like Enzyme is sometimes aggregating derivatives into the closure/struct and manipulating its data. Because of this everytime you call autodiff the internal state of the closure is changing which causes the forward pass to change.

The issue mentions a potential way to get around this using Duplicated.

1 Like

@zornsllama can you post a bug on Enzyme.jl with a full reproducer (e.g. that I can run)?