Hello!
I am trying to incorporate a derivative term of the outputs with respect to one of the inputs in the loss of a neural network via Flux.jl
, and I was hoping to get some help…
I looked into a few posts online, but I believe a lot of PINN-related questions involve models with a single input (e.g. time) and multiple outputs (e.g. states).
Meanwhile, what I would like to do is to train a model with multiple inputs & outputs, and a loss function that combines the MSE of the outputs & the sum of the gradients (technically speaking a jacobian?) of the outputs with respect to one of the inputs.
Below is a test code I’ve modified from this previous question, derived from DiffEqFlux.jl
.
I am trying to train a model that takes in 7 inputs and returns 7 outputs, where the loss is based on the MSE of the 7 outputs + the sum of the derivatives of the 7 outputs with respect to the 6th input.
I am still having dimension issues, and am still unable to incorporate the MSE into the loss…
I’d appreciate any help!
using Flux
using ReverseDiff
#Create modified Flux NN type that carries destructured parameters
struct ModNN{M, R, P}
model::M
re::R
p::P
function ModNN(model; p = nothing)
_p, re = Flux.destructure(model)
if p === nothing
p = _p
end
return new{typeof(model), typeof(re), typeof(p)}(model, re, p)
end
end
#function to compute the derivatives of NN outputs with respect to 6th input
function diffdestruct(re, p, xs)
H = [Flux.jacobian(w -> re(p)(w), [xs[:,i]])[1] for i = 1:size(xs,2)]
return H
end
#Create an instance of modified type
mod1 = ModNN(
Chain(Dense(7 => 32, relu), Dense(32 => 32, relu), Dense(32 => 7))
)
#Record parameter values to check that they're updating correctly
checkP = copy(mod1.p);
# sample dataset
N_data = 10
xtest = rand(7,N_data)
ytest = rand(7,N_data)
@show diffdestruct(mod1.re, mod1.p, xtest);
# Loss function (simple aggregation of NN outputs)
loss(re, p, xs, ys) = sum(diffdestruct(re, p, xs)) + Flux.Losses.mse(hcat([re(p)([xs[:,i]]) for i = 1:size(xs,2)]...), ys)
#Get the gradients using Reverse diff
gs = ReverseDiff.gradient(p -> loss(mod1.re, p, xtest, ytest), mod1.p)
opt = ADAM(0.01)
#Update the NN
Flux.Optimise.update!(opt,mod1.p,gs) # optimizer, model parameters, gradients
@show maximum(mod1.p .- checkP);
yields following error
ERROR: LoadError: DimensionMismatch("matrix A has dimensions (32,7), vector B has length 1")