Incorporating derivative term into loss for Flux.jl model

Hello!
I am trying to incorporate a derivative term of the outputs with respect to one of the inputs in the loss of a neural network via Flux.jl, and I was hoping to get some help…
I looked into a few posts online, but I believe a lot of PINN-related questions involve models with a single input (e.g. time) and multiple outputs (e.g. states).
Meanwhile, what I would like to do is to train a model with multiple inputs & outputs, and a loss function that combines the MSE of the outputs & the sum of the gradients (technically speaking a jacobian?) of the outputs with respect to one of the inputs.

Below is a test code I’ve modified from this previous question, derived from DiffEqFlux.jl.
I am trying to train a model that takes in 7 inputs and returns 7 outputs, where the loss is based on the MSE of the 7 outputs + the sum of the derivatives of the 7 outputs with respect to the 6th input.
I am still having dimension issues, and am still unable to incorporate the MSE into the loss…
I’d appreciate any help!

using Flux
using ReverseDiff

#Create modified Flux NN type that carries destructured parameters
struct ModNN{M, R, P}
    model::M
    re::R
    p::P

    function ModNN(model; p = nothing)
        _p, re = Flux.destructure(model)
        if p === nothing
            p = _p
        end
        return new{typeof(model), typeof(re), typeof(p)}(model, re, p)
    end
end

#function to compute the derivatives of NN outputs with respect to 6th input
function diffdestruct(re, p, xs)
    H = [Flux.jacobian(w -> re(p)(w), [xs[:,i]])[1] for i = 1:size(xs,2)]
    return H
end

#Create an instance of modified type
mod1 = ModNN(
	Chain(Dense(7 => 32, relu), Dense(32 => 32, relu), Dense(32 => 7))
)

#Record parameter values to check that they're updating correctly
checkP = copy(mod1.p);

# sample dataset
N_data = 10
xtest = rand(7,N_data)
ytest = rand(7,N_data)
@show diffdestruct(mod1.re, mod1.p, xtest);

# Loss function (simple aggregation of NN outputs)
loss(re, p, xs, ys) = sum(diffdestruct(re, p, xs)) + Flux.Losses.mse(hcat([re(p)([xs[:,i]]) for i = 1:size(xs,2)]...), ys)

#Get the gradients using Reverse diff
gs = ReverseDiff.gradient(p -> loss(mod1.re, p, xtest, ytest), mod1.p)

opt = ADAM(0.01)

#Update the NN
Flux.Optimise.update!(opt,mod1.p,gs)  # optimizer, model parameters, gradients

@show maximum(mod1.p .- checkP);

yields following error

ERROR: LoadError: DimensionMismatch("matrix A has dimensions (32,7), vector B has length 1")

Actually, I’ve been able to work it out - but the way I got it to work still feels clumsy (especially with all the hcat([something]...) that I am having to do… I’d appreciate any suggestions to improve this!

using Flux
using ReverseDiff

#Create modified Flux NN type that carries destructured parameters
struct ModNN{M, R, P}
    model::M
    re::R
    p::P

    function ModNN(model; p = nothing)
        _p, re = Flux.destructure(model)
        if p === nothing
            p = _p
        end
        return new{typeof(model), typeof(re), typeof(p)}(model, re, p)
    end
end

#function to compute the derivatives of NN outputs with respect to 6th input
function diffdestruct(re, p, xs)
    H = hcat([
        Flux.jacobian(w -> re(p)(w), xs[:,i])[1][:,6] for i = 1:size(xs,2)
    ]...)
    return H
end

#Create an instance of modified type
mod1 = ModNN(
	Chain(Dense(7 => 32, relu), Dense(32 => 32, relu), Dense(32 => 7))
)

#Record parameter values to check that they're updating correctly
checkP = copy(mod1.p);

# sample dataset
N_data = 10
xtest = rand(7,N_data)
ytest = rand(7,N_data)
# Flux.jacobian(el -> mod1.re(mod1.p)(el), x)[1]
@show diffdestruct(mod1.re, mod1.p, xtest);

# Loss function (simple aggregation of NN outputs)
loss(re, p, xs, ys) = sum(diffdestruct(re, p, xs)) + Flux.Losses.mse(hcat([re(p)(xs[:,i]) for i = 1:size(xs,2)]...), ys)

#Get the gradients using Reverse diff
gs = ReverseDiff.gradient(p -> loss(mod1.re, p, xtest, ytest), mod1.p)

opt = ADAM(0.01)

#Update the NN
Flux.Optimise.update!(opt,mod1.p,gs)  # optimizer, model parameters, gradients

@show maximum(mod1.p .- checkP);

Hi, I am working on similar objective. I have to input some input features along time variable to a PINN model but to predict one or more outputs attributes. How can I start working on this scenario? And another question is I want to input this data from a csv file and I have the output data which need to obey both Physics laws as in PINN and also take in the dataset as X input and y output ?
Any suggestions? Thanks in advance