Using the gradient of wrt the input in the loss function

I have attempted this for a few days but I can not figure out how to do this. Basically I am attempting to adapt a hamiltonian neural network (hnn), which requires the following loss function: Loss = || dH/dq - dq/dt||_2 + || dH/dp + dp/dt||_2

p and q are the input to the model (a vector: [q,p]) and H is the output of the model (scalar). dq/dt and dp/dt are the y’s, so those are not a problem. I need a way to calculate the gradients dH/dp and dH/dq in a way that allows me to still train a model, without screwing up the backwards pass, using Flux and Zygote. I constantly get errors attempting this, but is there some way to do this? The errors I get are different but they all tell me to:

Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

I am more looking for ideas in ways to execute this, rather than actual solutions for now, thanks for ANY tips!

1 Like

I think this is related, ping @avikpal

Thank you very much, I have taken ispiration from their docs and have tried a few different ways to train the model, but none of them actually updates the parameters. For each epoch, the loss stays equal, and I’ve tested that indeed, the parameters do not change. Originally, it was “apply_gradients!”, but I got an error-message on this stating that “apply_gradients! not defined”, hence I had to get rid of the “!”, which worked. Here is what I have right now:
using Lux.Experimental: apply_gradients
x=x_test[:,1:5] #2x5
y=y_test[:,1:5] #2x5

function loss_function1(model, ps, st, data)
# Make it a stateful layer
x = data[1]
y = data[2]
smodel = Lux.Experimental.StatefulLuxLayer(model, ps, st)
#y_pred, st = model(x, ps, st)
J = ForwardDiff.jacobian(smodel, x) #dH/dq, dH/dp

y_x = y[1, :]
y_v = y[2, :]
gradients = sum(J, dims=1)
grad_1 = gradients[1:2:end]
grad_2 = gradients[2:2:end]

loss_vec = abs.(grad_1 .- y_x) .+ abs.(grad_2 .+ y_v)
loss = sum(loss_vec) / length(y_x)
return loss, st, (;)

end

function main(tstate::Lux.Experimental.TrainState, vjp, data, epochs)
data = data .|> gpu_device()
for epoch in 1:epochs
grads, loss, stats, tstate = Lux.Experimental.compute_gradients(
vjp, loss_function1, data, tstate)
if epoch % 50 == 1 || epoch == epochs
@printf “Epoch: %3d \t Loss: %.5g\n” epoch loss
end
tstate = Lux.Experimental.apply_gradients(tstate, grads)
end
return tstate
end

dev_cpu = cpu_device()
dev_gpu = gpu_device()

tstate = main(tstate, vjp_rule, (x, y), 250)
y_pred = dev_cpu(Lux.apply(tstate.model, dev_gpu(x), tstate.parameters, tstate.states)[1])

See DiffEqFlux.jl/src/hnn.jl at master · SciML/DiffEqFlux.jl · GitHub for Hamiltonian Neural Networks, I recently updated the code there to support nested AD nicely.

Originally, it was “apply_gradients!”, but I got an error-message on this stating that “apply_gradients! not defined”

You are on an older Lux version in that case.

If parameters are not being updated, check the gradients. Check this thread [Nested AD] Incorrect gradient when taking a gradient over a gradient using StatefulLuxLayer · Issue #630 · LuxDL/Lux.jl · GitHub for some common mistakes and ways to verify gradients

Thank you very much for your help! My project is however to do this from scratch, but I have drawn inpiration from your github. I do still struggle though, when I call upon OptimizationFunction, I get an error which I do not understand, is this implementation on the right path? I know there are a few inconsistensies with the code, but right now, I just want it to at least run through:

using Optimization
using Zygote, Lux, Optimisers, Random, Statistics
ad=Lux.Training.AutoZygote()
ps, st = Lux.setup(Xoshiro(0), model_pre)

function hnn(ad, model, x, ps, st)#(hnn::HamiltonianNN{<:LuxCore.AbstractExplicitLayer})(x, ps, st)
    model2 = Lux.Experimental.StatefulLuxLayer(model_pre, ps, st)
    H = __hamiltonian_forward(model2, x)
    n = size(x, 1) ÷ 2
    return vcat(selectdim(H, 1, (n + 1):(2n)), -selectdim(H, 1, 1:n)), model2.st
end

function loss_function2(ps, data, target)

    pred, st_ = hnn(data, ps, st)
    return mean(abs2, pred .- target),pred#, st, ps#, pred
end

opt_func = OptimizationFunction((ps, _, data, target) -> loss_function2(ps, data, target),
    AutoForwardDiff())
opt_prob = OptimizationProblem(opt_func, ps_c)

Error:

MethodError: no method matching (OptimizationFunction{true})(::var"#19#20", ::AutoForwardDiff{nothing, Nothing})
Closest candidates are:
  (OptimizationFunction{iip})(::Any) where iip at ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3583
  (OptimizationFunction{iip})(::Any, ::SciMLBase.AbstractADType; grad, hess, hv, cons, cons_j, cons_h, lag_h, hess_prototype, cons_jac_prototype, cons_hess_prototype, lag_hess_prototype, syms, paramsyms, observed, hess_colorvec, cons_jac_colorvec, cons_hess_colorvec, lag_hess_colorvec, expr, cons_expr, sys) where iip at ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3583

Stacktrace:
 [1] OptimizationFunction(::Function, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ SciMLBase ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3581
 [2] OptimizationFunction(::Function, ::Vararg{Any})
   @ SciMLBase ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3581
 [3] top-level scope
   @ In[48]:2

I got it to work now, but when I call: res = Optimization.solve(opt_prob, opt, dataloader; callback), i get error:

MethodError: objects of type Nothing are not callable

Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/OptimizationOptimisers/hZdKg/src/OptimizationOptimisers.jl:65 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/Optimization/fPVIW/src/utils.jl:41 [inlined]
 [3] __solve(cache::OptimizationCache{OptimizationFunction{true, SciMLBase.NoAD, var"#36#37", Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Optimization.ReInitCache{ComponentVector{Float32, Vector{Float32}, Tuple{Axis{(layer_1 = ViewAxis(1:600, Axis(weight = ViewAxis(1:400, ShapedAxis((200, 2), NamedTuple())), bias = ViewAxis(401:600, ShapedAxis((200, 1), NamedTuple())))), layer_2 = ViewAxis(601:40800, Axis(weight = ViewAxis(1:40000, ShapedAxis((200, 200), NamedTuple())), bias = ViewAxis(40001:40200, ShapedAxis((200, 1), NamedTuple())))), layer_3 = ViewAxis(40801:41001, Axis(weight = ViewAxis(1:200, ShapedAxis((1, 200), NamedTuple())), bias = ViewAxis(201:201, ShapedAxis((1, 1), NamedTuple())))))}}}, SciMLBase.NullParameters}, Nothing, Nothing, Nothing, Nothing, Nothing, Adam{Float32}, IterTools.NCycle{Base.Generator{UnitRange{Int64}, var"#40#41"}}, Bool, var"#23#25"})
   @ OptimizationOptimisers ~/.julia/packages/OptimizationOptimisers/hZdKg/src/OptimizationOptimisers.jl:63
 [4] solve!
   @ ~/.julia/packages/SciMLBase/QqtZA/src/solve.jl:160 [inlined]
 [5] #solve#540
   @ ~/.julia/packages/SciMLBase/QqtZA/src/solve.jl:81 [inlined]
 [6] top-level scope
   @ In[80]:2

I really appreciate your help, thank you :slight_smile: