Using the gradient of wrt the input in the loss function

alekssek · May 30, 2024, 8:08pm

I have attempted this for a few days but I can not figure out how to do this. Basically I am attempting to adapt a hamiltonian neural network (hnn), which requires the following loss function: Loss = || dH/dq - dq/dt||_2 + || dH/dp + dp/dt||_2

p and q are the input to the model (a vector: [q,p]) and H is the output of the model (scalar). dq/dt and dp/dt are the y’s, so those are not a problem. I need a way to calculate the gradients dH/dp and dH/dq in a way that allows me to still train a model, without screwing up the backwards pass, using Flux and Zygote. I constantly get errors attempting this, but is there some way to do this? The errors I get are different but they all tell me to:

Refer to the Zygote documentation for fixes.
https://fluxml.ai/Zygote.jl/latest/limitations

I am more looking for ideas in ways to execute this, rather than actual solutions for now, thanks for ANY tips!

gdalle · May 30, 2024, 9:58pm

I think this is related, ping @avikpal

alekssek · May 31, 2024, 9:16am

Thank you very much, I have taken ispiration from their docs and have tried a few different ways to train the model, but none of them actually updates the parameters. For each epoch, the loss stays equal, and I’ve tested that indeed, the parameters do not change. Originally, it was “apply_gradients!”, but I got an error-message on this stating that “apply_gradients! not defined”, hence I had to get rid of the “!”, which worked. Here is what I have right now:
using Lux.Experimental: apply_gradients
x=x_test[:,1:5] #2x5
y=y_test[:,1:5] #2x5

function loss_function1(model, ps, st, data)
# Make it a stateful layer
x = data[1]
y = data[2]
smodel = Lux.Experimental.StatefulLuxLayer(model, ps, st)
#y_pred, st = model(x, ps, st)
J = ForwardDiff.jacobian(smodel, x) #dH/dq, dH/dp

y_x = y[1, :]
y_v = y[2, :]
gradients = sum(J, dims=1)
grad_1 = gradients[1:2:end]
grad_2 = gradients[2:2:end]

loss_vec = abs.(grad_1 .- y_x) .+ abs.(grad_2 .+ y_v)
loss = sum(loss_vec) / length(y_x)
return loss, st, (;)

end

function main(tstate::Lux.Experimental.TrainState, vjp, data, epochs)
data = data .|> gpu_device()
for epoch in 1:epochs
grads, loss, stats, tstate = Lux.Experimental.compute_gradients(
vjp, loss_function1, data, tstate)
if epoch % 50 == 1 || epoch == epochs
@printf “Epoch: %3d \t Loss: %.5g\n” epoch loss
end
tstate = Lux.Experimental.apply_gradients(tstate, grads)
end
return tstate
end

dev_cpu = cpu_device()
dev_gpu = gpu_device()

tstate = main(tstate, vjp_rule, (x, y), 250)
y_pred = dev_cpu(Lux.apply(tstate.model, dev_gpu(x), tstate.parameters, tstate.states)[1])

avikpal · June 1, 2024, 4:06am

See DiffEqFlux.jl/src/hnn.jl at master · SciML/DiffEqFlux.jl · GitHub for Hamiltonian Neural Networks, I recently updated the code there to support nested AD nicely.

Originally, it was “apply_gradients!”, but I got an error-message on this stating that “apply_gradients! not defined”

You are on an older Lux version in that case.

avikpal · June 1, 2024, 4:14am

If parameters are not being updated, check the gradients. Check this thread [Nested AD] Incorrect gradient when taking a gradient over a gradient using StatefulLuxLayer · Issue #630 · LuxDL/Lux.jl · GitHub for some common mistakes and ways to verify gradients

alekssek · June 2, 2024, 10:10am

Thank you very much for your help! My project is however to do this from scratch, but I have drawn inpiration from your github. I do still struggle though, when I call upon OptimizationFunction, I get an error which I do not understand, is this implementation on the right path? I know there are a few inconsistensies with the code, but right now, I just want it to at least run through:

using Optimization
using Zygote, Lux, Optimisers, Random, Statistics
ad=Lux.Training.AutoZygote()
ps, st = Lux.setup(Xoshiro(0), model_pre)

function hnn(ad, model, x, ps, st)#(hnn::HamiltonianNN{<:LuxCore.AbstractExplicitLayer})(x, ps, st)
    model2 = Lux.Experimental.StatefulLuxLayer(model_pre, ps, st)
    H = __hamiltonian_forward(model2, x)
    n = size(x, 1) ÷ 2
    return vcat(selectdim(H, 1, (n + 1):(2n)), -selectdim(H, 1, 1:n)), model2.st
end

function loss_function2(ps, data, target)

    pred, st_ = hnn(data, ps, st)
    return mean(abs2, pred .- target),pred#, st, ps#, pred
end

opt_func = OptimizationFunction((ps, _, data, target) -> loss_function2(ps, data, target),
    AutoForwardDiff())
opt_prob = OptimizationProblem(opt_func, ps_c)

Error:

MethodError: no method matching (OptimizationFunction{true})(::var"#19#20", ::AutoForwardDiff{nothing, Nothing})
Closest candidates are:
  (OptimizationFunction{iip})(::Any) where iip at ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3583
  (OptimizationFunction{iip})(::Any, ::SciMLBase.AbstractADType; grad, hess, hv, cons, cons_j, cons_h, lag_h, hess_prototype, cons_jac_prototype, cons_hess_prototype, lag_hess_prototype, syms, paramsyms, observed, hess_colorvec, cons_jac_colorvec, cons_hess_colorvec, lag_hess_colorvec, expr, cons_expr, sys) where iip at ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3583

Stacktrace:
 [1] OptimizationFunction(::Function, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ SciMLBase ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3581
 [2] OptimizationFunction(::Function, ::Vararg{Any})
   @ SciMLBase ~/.julia/packages/SciMLBase/QqtZA/src/scimlfunctions.jl:3581
 [3] top-level scope
   @ In[48]:2

alekssek · June 2, 2024, 10:33am

I got it to work now, but when I call: res = Optimization.solve(opt_prob, opt, dataloader; callback), i get error:

MethodError: objects of type Nothing are not callable

Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/OptimizationOptimisers/hZdKg/src/OptimizationOptimisers.jl:65 [inlined]
 [2] macro expansion
   @ ~/.julia/packages/Optimization/fPVIW/src/utils.jl:41 [inlined]
 [3] __solve(cache::OptimizationCache{OptimizationFunction{true, SciMLBase.NoAD, var"#36#37", Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Optimization.ReInitCache{ComponentVector{Float32, Vector{Float32}, Tuple{Axis{(layer_1 = ViewAxis(1:600, Axis(weight = ViewAxis(1:400, ShapedAxis((200, 2), NamedTuple())), bias = ViewAxis(401:600, ShapedAxis((200, 1), NamedTuple())))), layer_2 = ViewAxis(601:40800, Axis(weight = ViewAxis(1:40000, ShapedAxis((200, 200), NamedTuple())), bias = ViewAxis(40001:40200, ShapedAxis((200, 1), NamedTuple())))), layer_3 = ViewAxis(40801:41001, Axis(weight = ViewAxis(1:200, ShapedAxis((1, 200), NamedTuple())), bias = ViewAxis(201:201, ShapedAxis((1, 1), NamedTuple())))))}}}, SciMLBase.NullParameters}, Nothing, Nothing, Nothing, Nothing, Nothing, Adam{Float32}, IterTools.NCycle{Base.Generator{UnitRange{Int64}, var"#40#41"}}, Bool, var"#23#25"})
   @ OptimizationOptimisers ~/.julia/packages/OptimizationOptimisers/hZdKg/src/OptimizationOptimisers.jl:63
 [4] solve!
   @ ~/.julia/packages/SciMLBase/QqtZA/src/solve.jl:160 [inlined]
 [5] #solve#540
   @ ~/.julia/packages/SciMLBase/QqtZA/src/solve.jl:81 [inlined]
 [6] top-level scope
   @ In[80]:2

I really appreciate your help, thank you

Topic		Replies	Views
How to use gradient of neural network as the loss function? Machine Learning question	13	2739	March 23, 2021
Flux differentiation error Machine Learning zygote	19	1684	November 19, 2020
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	558	April 8, 2022
Loss functions that involve gradients Machine Learning	4	606	November 22, 2022
Nested and different AD methods altogether: How to add AD calculations inside my loss function when using neural differential equations? Machine Learning sciml , ad , neural-network , differentialequation	9	987	September 28, 2024

Using the gradient of wrt the input in the loss function

Related topics