Hybrid ODE with ContinuousCallback for models of changing sizes?

Sure thing! See below:

It just occurred to me, though, that maybe this has something to do with the input dimension expected by the neural network? It is defined to take in input dimensions equal to the initial number of particles – but that will be changing (decreasing) during the integration because of the callback condition.

using DifferentialEquations
using SciMLSensitivity
using Random
using Distributions
using Plots
using LinearSolve
rng = Random.default_rng()
Random.seed!(1010)

using OrdinaryDiffEq, ModelingToolkit, LinearAlgebra, ComponentArrays, Optimization, OptimizationOptimisers, OptimizationOptimJL, Lux,
ComponentArrays, DiffEqFlux, JLD2, FileIO, Statistics

# ODE Function
function particle_radius_change_dynamics(du, u, p, t)
    rate_constants = p
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i] - 1/mean_size) 
    end
    return nothing
end

# Callback condition for removing particles when radius is at/close to 0
function condition(u, t, integrator)
    # Trigger the event if the order of magnitude of the radius goes below the threshold
    minimum(u) < 1e-4 ? 0 : 1
end

# Callback to modify the state vector if above condition is met
function affect!(integrator)
    println("You've entered the callback")
    original_size = length(integrator.u)
    idxs = findall(r -> r <= 1e-4, integrator.u)
    new_size = original_size - length(idxs)

    # Remove the identified radii that have effectively gone to zero
    deleteat!(integrator.u, idxs)
    deleteat_non_user_cache!(integrator, idxs)

    resize!(integrator, new_size)
    resize_non_user_cache!(integrator, new_size)
    nothing
end

# === PROBLEM SETUP === 
# Number of particles, and the mean size and standard deviation of particle sizes
num_particles = 11
mean_size = 1.2
std = 0.1*mean_size

# Generate random initial particle size distribution 
initial_radii = rand(Normal(mean_size, std), num_particles)

# Generate random rate rate constants for each particle (growing or shrinking -- usually would follow conservation laws but this is just an example)
random_rates = rand(Uniform(-1,1),num_particles)

# Make input vectors
u0 = initial_radii
p_true = random_rates
tspan = (0, 100)

# Make ODE problem
prob = ODEProblem(particle_radius_change_dynamics, u0, tspan, p_true)

# Callback to remove a particle from the simulation if it effectively "disappears" due to shrinking
disappearing_callback = ContinuousCallback(condition, affect!)

# === SOLVE ===
@time sol = solve(prob, Rosenbrock23(linsolve = LUFactorization()), 
maxiters=1e6, abstol=1e-5, reltol=1e-5, sensealg=SciMLSensitivity.TrackerAdjoint, isoutofdomain=(u,p,t)->any(x->x<0.0, u), callback=disappearing_callback)

println("You've solved the ODE")
# See number of particles decrease over time 
display(plot(sol.t, map((x) -> length(x), sol[:]), lw = 3,
    ylabel = "Number of Nanoparticles", xlabel = "Time"))

# ==== TEST THE UDE SET UP

new_u = Array(sol.u)

# == Need to fill in the missing data with NaN so that its a matrix we can plot/generate data
function pad_vectors_with_NaN(vectors::Vector{Vector{Float64}}, M::Int)
    N = length(vectors)
    matrix = fill(NaN, N, M)
    
    for i in 1:N
        length_vec = min(length(vectors[i]), M)
        matrix[i, 1:length_vec] = vectors[i][1:length_vec]
    end
    
    return matrix
end

X = pad_vectors_with_NaN(new_u, length(sol.t))
t = sol.t

relative_factor = 1e-4
magnitude = relative_factor * abs.(X)
noise = rand(Normal(0,1), size(X))
scaled_noise = magnitude .* noise
Xₙ = X .+ scaled_noise

plt = plot(t, X[:,1], alpha = 0.75, color = :black, label = ["True Data" nothing], title="Single trajectory")
scatter!(plt, t, Xₙ[:,1], color = :red, label = ["Noisy Data" nothing], markersize=2)
#display(plt)

## Define the network
# Gaussian RBF as activation
rbf(x) = exp.(-(x.^2))

# Multilayer FeedForward
U = Lux.Chain(
    Lux.Dense(num_particles,5,rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,num_particles)
)
# Get the initial parameters and state variables of the model
p, st = Lux.setup(rng, U)

# Define the hybrid model
function ude_dynamics!(du,u, p, t, p_true)
    û = U(u, p, st)[1] # Network prediction
    rate_constants = p_true
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i]) + û[i]
    end
    return nothing
end

# Closure with the known parameter
nn_dynamics!(du,u,p,t) = ude_dynamics!(du,u,p,t,p_true)
# Define the problem
disappearing_callback = ContinuousCallback(condition, affect!)
prob_nn = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, Xₙ[:, 1], tspan, p)

## Function to train the network
# Define a predictor
function predict!(θ, X = Xₙ[:,1], T = t)
    #_prob = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, X, (T[1], T[end]), θ)
    println("You're trying to make a prediction for the UDE...")
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    Array(solve(_prob, Rosenbrock23(linsolve = LUFactorization()), saveat = T,
                abstol=1e-6, reltol=1e-6, sensealg=SciMLSensitivity.TrackerAdjoint, callback=disappearing_callback))
    println("You've solved the UDE")
end

# Simple L2 loss
function loss(θ)
    X̂ = predict!(θ)
    sum(abs2, Xₙ .- X̂)
end

# Container to track the losses
losses = Float64[]

callback = function (p, l)
  push!(losses, l)
  if length(losses)%50==0
      println("Current loss after $(length(losses)) iterations: $(losses[end])")
  end
  return false
end

## Training
# First train with ADAM for better convergence -> move the parameters into a
# favourable starting positing for BFGS
adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x,p)->loss(x), adtype)
optprob = Optimization.OptimizationProblem(optf, ComponentVector{Float64}(p))
res1 = Optimization.solve(optprob, ADAM(0.1), callback=callback, maxiters = 5)
println("Training loss after $(length(losses)) iterations: $(losses[end])")

That example seems to have many issues with sizes. You have X = Xₙ[:,1] which is your initial condition which is 333. This makes u0 a length 333 vector.

U = Lux.Chain(
    Lux.Dense(num_particles,5,rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,num_particles)
)

expects U(u, p, st)[1] u to be length of num_particles, which you have as 11. So it fails even before it gets to differentiation as it fails in the forward pass due to a size mismatch.

Ack, sorry, my bad! Yes, I can fix the size mis-match issue. In padding the state vector initial conditions with NaN (for disappearing elements) I had the wrong NxM – but, with that fixed, I am still running into issues.

Fixed example wherein X=Xₙ[:,1] is dimension num_particles (before it was dimension sol.t).

using DifferentialEquations
using SciMLSensitivity
using Random
using Distributions
using Plots
using LinearSolve
rng = Random.default_rng()
Random.seed!(1010)

using OrdinaryDiffEq, ModelingToolkit, LinearAlgebra, ComponentArrays, Optimization, OptimizationOptimisers, OptimizationOptimJL, Lux,
ComponentArrays, DiffEqFlux, JLD2, FileIO, Statistics

# ODE Function
function particle_radius_change_dynamics(du, u, p, t)
    rate_constants = p
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i] - 1/mean_size) 
    end
    return nothing
end

# Callback condition for removing particles when radius is at/close to 0
function condition(u, t, integrator)
    # Trigger the event if the order of magnitude of the radius goes below the threshold
    minimum(u) < 1e-4 ? 0 : 1
end

# Callback to modify the state vector if above condition is met
function affect!(integrator)
    println("You've entered the callback")
    original_size = length(integrator.u)
    idxs = findall(r -> r <= 1e-4, integrator.u)
    new_size = original_size - length(idxs)

    # Remove the identified radii that have effectively gone to zero
    deleteat!(integrator.u, idxs)
    deleteat_non_user_cache!(integrator, idxs)

    resize!(integrator, new_size)
    resize_non_user_cache!(integrator, new_size)
    nothing
end

# === PROBLEM SETUP === 
# Number of particles, and the mean size and standard deviation of particle sizes
num_particles = 11
mean_size = 1.2
std = 0.1 * mean_size

# Generate random initial particle size distribution 
initial_radii = rand(Normal(mean_size, std), num_particles)

# Generate random rate rate constants for each particle (growing or shrinking -- usually would follow conservation laws but this is just an example)
random_rates = rand(Uniform(-1,1), num_particles)

# Make input vectors
u0 = initial_radii
p_true = random_rates
tspan = (0, 10)

# Make ODE problem
prob = ODEProblem(particle_radius_change_dynamics, u0, tspan, p_true)

# Callback to remove a particle from the simulation if it effectively "disappears" due to shrinking
disappearing_callback = ContinuousCallback(condition, affect!)

# === SOLVE ===
@time sol = solve(prob, Rosenbrock23(linsolve = LUFactorization()), 
maxiters=1e6, abstol=1e-5, reltol=1e-5, sensealg=SciMLSensitivity.TrackerAdjoint, isoutofdomain=(u,p,t)->any(x->x<0.0, u), callback=disappearing_callback)

println("You've solved the ODE")
# See number of particles decrease over time 
display(plot(sol.t, map((x) -> length(x), sol[:]), lw = 3,
    ylabel = "Number of Nanoparticles", xlabel = "Time"))

# ==== TEST THE UDE SET UP
new_u = Array(sol.u)

# == Need to fill in the missing data with NaN so that its a matrix we can plot/generate data
function pad_vectors_with_NaN(vectors::Vector{Vector{Float64}}, M::Int)
    N = length(vectors[1])
    matrix = fill(NaN, N, M)
    
    for i in 1:N
        length_vec = min(length(vectors[i]), M)
        matrix[i, 1:length_vec] = vectors[i][1:length_vec]
    end
    
    return matrix
end

X = pad_vectors_with_NaN(new_u, length(sol.t))
t = sol.t

relative_factor = 1e-4
magnitude = relative_factor * abs.(X)
noise = rand(Normal(0,1), size(X))
scaled_noise = magnitude .* noise
Xₙ = X .+ scaled_noise

plt = plot(t, X[1,:], alpha = 0.75, color = :black, label = ["True Data" nothing], title="Single trajectory")
scatter!(plt, t, Xₙ[1,:], color = :red, label = ["Noisy Data" nothing], markersize=2)
display(plt)

## Define the network
# Gaussian RBF as activation
rbf(x) = exp.(-(x.^2))

# Multilayer FeedForward
U = Lux.Chain(
    Lux.Dense(num_particles,5,rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,num_particles)
)
# Get the initial parameters and state variables of the model
p, st = Lux.setup(rng, U)

# Define the hybrid model
function ude_dynamics!(du,u, p, t, p_true)
    û = U(u, p, st)[1] # Network prediction
    rate_constants = p_true
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i]) + û[i]
    end
    return nothing
end

# Closure with the known parameter
nn_dynamics!(du,u,p,t) = ude_dynamics!(du,u,p,t,p_true)
# Define the problem
disappearing_callback = ContinuousCallback(condition, affect!)
prob_nn = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, Xₙ[:, 1], tspan, p)

## Function to train the network
# Define a predictor
function predict!(θ, X = Xₙ[:,1], T = t)
    #_prob = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, X, (T[1], T[end]), θ)
    @show size(X) # This is now the correct size!
    println("You're trying to make a prediction for the UDE...")
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    Array(solve(_prob, Rosenbrock23(linsolve = LUFactorization()), saveat = T,
                abstol=1e-6, reltol=1e-6, sensealg=SciMLSensitivity.TrackerAdjoint, callback=disappearing_callback))
    println("You've solved the UDE")
end

# Simple L2 loss
function loss(θ)
    X̂ = predict!(θ)
    sum(abs2, Xₙ .- X̂)
end

# Container to track the losses
losses = Float64[]

callback = function (p, l)
  push!(losses, l)
  if length(losses)%50==0
      println("Current loss after $(length(losses)) iterations: $(losses[end])")
  end
  return false
end

## Training
# First train with ADAM for better convergence -> move the parameters into a
# favourable starting positing for BFGS
adtype = Optimization.AutoZygote()
optf = Optimization.OptimizationFunction((x,p)->loss(x), adtype)
optprob = Optimization.OptimizationProblem(optf, ComponentVector{Float64}(p))
res1 = Optimization.solve(optprob, ADAM(0.1), callback=callback, maxiters = 5)
println("Training loss after $(length(losses)) iterations: $(losses[end])")

But when I run this I get the following error:

You've entered the callback
  0.496430 seconds (1.47 M allocations: 94.917 MiB, 3.60% gc time, 99.45% compilation time: 3% of which was recompilation)
You've solved the ODE
size(X) = (11,)
You're trying to make a prediction for the UDE...
ERROR: ArgumentError: new: too few arguments (expected 3)
Stacktrace:
  [1] __new__
    @ ~/.julia/packages/Zygote/nsBv0/src/tools/builtins.jl:9 [inlined]
  [2] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:296 [inlined]
  [3] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
  [4] BitArray
    @ ./bitarray.jl:39 [inlined]
  [5] _pullback(::Zygote.Context{false}, ::Type{BitMatrix}, ::UndefInitializer, ::Int64, ::Int64)
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
  [6] generate_chunked_partials
    @ ~/.julia/packages/SparseDiffTools/CPCma/src/differentiation/compute_jacobian_ad.jl:84 [inlined]
  [7] _pullback(::Zygote.Context{…}, ::typeof(SparseDiffTools.generate_chunked_partials), ::Vector{…}, ::UnitRange{…}, ::Val{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
  [8] #ForwardColorJacCache#13
    @ ~/.julia/packages/SparseDiffTools/CPCma/src/differentiation/compute_jacobian_ad.jl:37 [inlined]
  [9] _pullback(::Zygote.Context{…}, ::SparseDiffTools.var"##ForwardColorJacCache#13", ::Nothing, ::Type{…}, ::UnitRange{…}, ::Nothing, ::Type{…}, ::SciMLBase.UJacobianWrapper{…}, ::Vector{…}, ::Val{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [10] ForwardColorJacCache
    @ ~/.julia/packages/SparseDiffTools/CPCma/src/differentiation/compute_jacobian_ad.jl:22 [inlined]
 [11] _pullback(::Zygote.Context{…}, ::typeof(Core.kwcall), ::@NamedTuple{…}, ::Type{…}, ::SciMLBase.UJacobianWrapper{…}, ::Vector{…}, ::Val{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [12] build_jac_config
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/derivative_wrappers.jl:292 [inlined]
 [13] _pullback(::Zygote.Context{…}, ::typeof(OrdinaryDiffEq.build_jac_config), ::Rosenbrock23{…}, ::ODEFunction{…}, ::SciMLBase.UJacobianWrapper{…}, ::Vector{…}, ::Vector{…}, ::Vector{…}, ::Vector{…}, ::Vector{…}, ::Val{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [14] alg_cache
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/caches/rosenbrock_caches.jl:108 [inlined]
 [15] _pullback(::Zygote.Context{…}, ::typeof(alg_cache), ::Rosenbrock23{…}, ::Vector{…}, ::Vector{…}, ::Type{…}, ::Type{…}, ::Type{…}, ::Vector{…}, ::Vector{…}, ::ODEFunction{…}, ::Float64, ::Float64, ::Float64, ::ComponentVector{…}, ::Bool, ::Val{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [16] #__init#806
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/solve.jl:344 [inlined]
 [17] _pullback(::Zygote.Context{…}, ::OrdinaryDiffEq.var"##__init#806", ::Vector{…}, ::Tuple{}, ::Tuple{}, ::Nothing, ::Bool, ::Bool, ::Bool, ::Nothing, ::ContinuousCallback{…}, ::Bool, ::Bool, ::Float64, ::Float64, ::Float64, ::Bool, ::Bool, ::Rational{…}, ::Float64, ::Float64, ::Rational{…}, ::Int64, ::Int64, ::Rational{…}, ::Nothing, ::Nothing, ::Rational{…}, ::Nothing, ::Bool, ::Int64, ::Int64, ::typeof(DiffEqBase.ODE_DEFAULT_NORM), ::typeof(opnorm), ::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), ::typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK), ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Bool, ::Int64, ::String, ::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), ::Symbol, ::Nothing, ::Bool, ::Bool, ::Bool, ::Bool, ::OrdinaryDiffEq.DefaultInit, ::@Kwargs{}, ::typeof(SciMLBase.__init), ::ODEProblem{…}, ::Rosenbrock23{…}, ::Tuple{}, ::Tuple{}, ::Tuple{}, ::Type{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [18] __init
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/solve.jl:11 [inlined]
 [19] _pullback(::Zygote.Context{…}, ::typeof(Core.kwcall), ::@NamedTuple{…}, ::typeof(SciMLBase.__init), ::ODEProblem{…}, ::Rosenbrock23{…}, ::Tuple{}, ::Tuple{}, ::Tuple{}, ::Type{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [20] __init (repeats 4 times)
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/solve.jl:11 [inlined]
 [21] _apply
    @ ./boot.jl:838 [inlined]
 [22] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [23] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [24] #__solve#805
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/solve.jl:6 [inlined]
 [25] _pullback(::Zygote.Context{…}, ::OrdinaryDiffEq.var"##__solve#805", ::@Kwargs{…}, ::typeof(SciMLBase.__solve), ::ODEProblem{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [26] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [27] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [28] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [29] __solve
    @ ~/.julia/packages/OrdinaryDiffEq/tAI61/src/solve.jl:1 [inlined]
 [30] _pullback(::Zygote.Context{…}, ::typeof(Core.kwcall), ::@NamedTuple{…}, ::typeof(SciMLBase.__solve), ::ODEProblem{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [31] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [32] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [33] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [34] #solve_call#44
    @ ~/.julia/packages/DiffEqBase/yM6LF/src/solve.jl:612 [inlined]
 [35] _pullback(::Zygote.Context{…}, ::DiffEqBase.var"##solve_call#44", ::Bool, ::Nothing, ::@Kwargs{…}, ::typeof(DiffEqBase.solve_call), ::ODEProblem{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [36] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [37] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [38] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [39] solve_call
    @ ~/.julia/packages/DiffEqBase/yM6LF/src/solve.jl:569 [inlined]
 [40] _pullback(::Zygote.Context{…}, ::typeof(Core.kwcall), ::@NamedTuple{…}, ::typeof(DiffEqBase.solve_call), ::ODEProblem{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [41] #solve_up#53
    @ ~/.julia/packages/DiffEqBase/yM6LF/src/solve.jl:1080 [inlined]
 [42] _pullback(::Zygote.Context{…}, ::DiffEqBase.var"##solve_up#53", ::@Kwargs{…}, ::typeof(DiffEqBase.solve_up), ::ODEProblem{…}, ::Type{…}, ::Vector{…}, ::ComponentVector{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [43] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [44] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [45] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [46] solve_up
    @ ~/.julia/packages/DiffEqBase/yM6LF/src/solve.jl:1066 [inlined]
 [47] _pullback(::Zygote.Context{…}, ::typeof(Core.kwcall), ::@NamedTuple{…}, ::typeof(DiffEqBase.solve_up), ::ODEProblem{…}, ::Type{…}, ::Vector{…}, ::ComponentVector{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [48] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [49] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [50] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [51] #solve#51
    @ ~/.julia/packages/DiffEqBase/yM6LF/src/solve.jl:1003 [inlined]
 [52] _pullback(::Zygote.Context{…}, ::DiffEqBase.var"##solve#51", ::Type{…}, ::Nothing, ::Nothing, ::Val{…}, ::@Kwargs{…}, ::typeof(solve), ::ODEProblem{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [53] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [54] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [55] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [56] solve
    @ ~/.julia/packages/DiffEqBase/yM6LF/src/solve.jl:993 [inlined]
 [57] _pullback(::Zygote.Context{…}, ::typeof(Core.kwcall), ::@NamedTuple{…}, ::typeof(solve), ::ODEProblem{…}, ::Rosenbrock23{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [58] predict!
    @ ~/Library/CloudStorage/OneDrive-PNNL/Documents/Projects/ACCELERATE/sintering_kinetic_model/minimal_example.jl:142 [inlined]
 [59] _pullback(::Zygote.Context{…}, ::typeof(predict!), ::ComponentVector{…}, ::Vector{…}, ::Vector{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [60] predict!
    @ ~/Library/CloudStorage/OneDrive-PNNL/Documents/Projects/ACCELERATE/sintering_kinetic_model/minimal_example.jl:138 [inlined]
 [61] _pullback(ctx::Zygote.Context{false}, f::typeof(predict!), args::ComponentVector{Float64, Vector{…}, Tuple{…}})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [62] loss
    @ ~/Library/CloudStorage/OneDrive-PNNL/Documents/Projects/ACCELERATE/sintering_kinetic_model/minimal_example.jl:149 [inlined]
 [63] _pullback(ctx::Zygote.Context{false}, f::typeof(loss), args::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [64] #175
    @ ~/Library/CloudStorage/OneDrive-PNNL/Documents/Projects/ACCELERATE/sintering_kinetic_model/minimal_example.jl:168 [inlined]
 [65] _pullback(::Zygote.Context{…}, ::var"#175#176", ::ComponentVector{…}, ::SciMLBase.NullParameters)
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [66] _apply
    @ ./boot.jl:838 [inlined]
 [67] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [68] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [69] OptimizationFunction
    @ ~/.julia/packages/SciMLBase/JUp1I/src/scimlfunctions.jl:3762 [inlined]
 [70] _pullback(::Zygote.Context{…}, ::OptimizationFunction{…}, ::ComponentVector{…}, ::SciMLBase.NullParameters)
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [71] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [72] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [73] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [74] #37
    @ ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationZygoteExt.jl:90 [inlined]
 [75] _pullback(ctx::Zygote.Context{…}, f::OptimizationZygoteExt.var"#37#55"{…}, args::ComponentVector{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [76] _apply(::Function, ::Vararg{Any})
    @ Core ./boot.jl:838
 [77] adjoint
    @ ~/.julia/packages/Zygote/nsBv0/src/lib/lib.jl:203 [inlined]
 [78] _pullback
    @ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:67 [inlined]
 [79] #39
    @ ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationZygoteExt.jl:93 [inlined]
 [80] _pullback(ctx::Zygote.Context{…}, f::OptimizationZygoteExt.var"#39#57"{…}, args::ComponentVector{…})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface2.jl:0
 [81] pullback(f::Function, cx::Zygote.Context{false}, args::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface.jl:90
 [82] pullback
    @ ~/.julia/packages/Zygote/nsBv0/src/compiler/interface.jl:88 [inlined]
 [83] gradient(f::Function, args::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Zygote ~/.julia/packages/Zygote/nsBv0/src/compiler/interface.jl:147
 [84] (::OptimizationZygoteExt.var"#38#56"{…})(::ComponentVector{…}, ::ComponentVector{…})
    @ OptimizationZygoteExt ~/.julia/packages/OptimizationBase/rRpJs/ext/OptimizationZygoteExt.jl:93
 [85] macro expansion
    @ ~/.julia/packages/OptimizationOptimisers/AOkbT/src/OptimizationOptimisers.jl:68 [inlined]
 [86] macro expansion
    @ ~/.julia/packages/Optimization/5DEdF/src/utils.jl:32 [inlined]
 [87] __solve(cache::OptimizationCache{…})
    @ OptimizationOptimisers ~/.julia/packages/OptimizationOptimisers/AOkbT/src/OptimizationOptimisers.jl:66
 [88] solve!(cache::OptimizationCache{…})
    @ SciMLBase ~/.julia/packages/SciMLBase/JUp1I/src/solve.jl:188
 [89] solve(::OptimizationProblem{…}, ::Optimisers.Adam; kwargs::@Kwargs{…})
    @ SciMLBase ~/.julia/packages/SciMLBase/JUp1I/src/solve.jl:96
Some type information was truncated. Use `show(err)` to see complete types.

Did you try running your loss function first?

julia> loss(ComponentVector{Float64}(p))
size(X) = (11,)
You're trying to make a prediction for the UDE...
You've entered the callback
ERROR: cannot resize array with shared data
Stacktrace:
  [1] _deleteend!
    @ .\array.jl:1081 [inlined]
  [2] _deleteat!(a::Vector{Float64}, inds::Vector{Int64}, dltd::Base.Nowhere)
    @ Base .\array.jl:1686
  [3] _deleteat!
    @ .\array.jl:1657 [inlined]
  [4] deleteat!
    @ .\array.jl:1634 [inlined]
  [5] affect!(integrator::OrdinaryDiffEq.ODEIntegrator{…})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:41
  [6] apply_callback!(integrator::OrdinaryDiffEq.ODEIntegrator{…}, callback::ContinuousCallback{…}, cb_time::Float64, prev_sign::Int64, event_idx::Int64)
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\callbacks.jl:585
  [7] macro expansion
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:313 [inlined]
  [8] apply_ith_callback!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:299 [inlined]
  [9] handle_callbacks!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:338 [inlined]
 [10] _loopfooter!(integrator::OrdinaryDiffEq.ODEIntegrator{…})
    @ OrdinaryDiffEq C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:254
 [11] loopfooter!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:207 [inlined]
 [12] solve!(integrator::OrdinaryDiffEq.ODEIntegrator{…})
    @ OrdinaryDiffEq C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\solve.jl:558
 [13] #__solve#670
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\solve.jl:7 [inlined]
 [14] __solve
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\solve.jl:1 [inlined]
 [15] solve_call(_prob::ODEProblem{…}, args::Rosenbrock23{…}; merge_callbacks::Bool, kwargshandle::Nothing, kwargs::@Kwargs{…})
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:612
 [16] solve_up(prob::ODEProblem{…}, sensealg::Type, u0::Vector{…}, p::ComponentVector{…}, args::Rosenbrock23{…}; kwargs::@Kwargs{…})
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:1080
 [17] solve_up
    @ C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:1066 [inlined]
 [18] solve(prob::ODEProblem{…}, args::Rosenbrock23{…}; sensealg::Type, u0::Nothing, p::Nothing, wrap::Val{…}, kwargs::@Kwargs{…})
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:1003
 [19] predict!(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}}, X::Vector{Float64}, T::Vector{Float64})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:145
 [20] predict!(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:142
 [21] loss(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:152
 [22] top-level scope
    @ REPL[1]:1
Some type information was truncated. Use `show(err)` to see complete types.

That seems to be due to some caching, to workaround it I just do û = U(copy(u), p, st)[1]. So that’s step one. But then step two:

function ude_dynamics!(du,u, p, t, p_true)
    @show size(u)
    û = U(copy(u), p, st)[1] # Network prediction
    rate_constants = p_true
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i]) + û[i]
    end
    return nothing
end
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
size(u) = (11,)
You've entered the callback
size(u) = (10,)
ERROR: DimensionMismatch: A has dimensions (5,11) but B has dimensions (10,1)
Stacktrace:
  [1] gemm_wrapper!(C::Matrix{…}, tA::Char, tB::Char, A::Base.ReshapedArray{…}, B::Matrix{…}, _add::LinearAlgebra.MulAddMul{…})
    @ LinearAlgebra C:\Users\accou\.julia\juliaup\julia-1.10.4+0.x64.w64.mingw32\share\julia\stdlib\v1.10\LinearAlgebra\src\matmul.jl:577
  [2] generic_matmatmul!
    @ C:\Users\accou\.julia\juliaup\julia-1.10.4+0.x64.w64.mingw32\share\julia\stdlib\v1.10\LinearAlgebra\src\matmul.jl:352 [inlined]
  [3] mul!
    @ C:\Users\accou\.julia\juliaup\julia-1.10.4+0.x64.w64.mingw32\share\julia\stdlib\v1.10\LinearAlgebra\src\matmul.jl:263 [inlined]
  [4] mul!
    @ C:\Users\accou\.julia\juliaup\julia-1.10.4+0.x64.w64.mingw32\share\julia\stdlib\v1.10\LinearAlgebra\src\matmul.jl:237 [inlined]
  [5] __matmul!
    @ C:\Users\accou\.julia\packages\LuxLib\VRICL\src\impl\fused_dense.jl:5 [inlined]
  [6] __fused_dense_bias_activation_impl
    @ C:\Users\accou\.julia\packages\LuxLib\VRICL\src\impl\fused_dense.jl:30 [inlined]
  [7] fused_dense_bias_activation
    @ C:\Users\accou\.julia\packages\LuxLib\VRICL\src\api\dense.jl:46 [inlined]
  [8] fused_dense_bias_activation
    @ C:\Users\accou\.julia\packages\LuxLib\VRICL\src\api\dense.jl:38 [inlined]
  [9] Dense
    @ C:\Users\accou\.julia\packages\Lux\7UzHr\src\layers\basic.jl:218 [inlined]
 [10] Dense
    @ C:\Users\accou\.julia\packages\Lux\7UzHr\src\layers\basic.jl:214 [inlined]
 [11] apply(model::Dense{…}, x::Vector{…}, ps::ComponentVector{…}, st::@NamedTuple{})
    @ LuxCore C:\Users\accou\.julia\packages\LuxCore\qiHPC\src\LuxCore.jl:179
 [12] macro expansion
    @ .\abstractarray.jl:0 [inlined]
 [13] applychain(layers::@NamedTuple{…}, x::Vector{…}, ps::ComponentVector{…}, st::@NamedTuple{…})
    @ Lux C:\Users\accou\.julia\packages\Lux\7UzHr\src\layers\containers.jl:478
 [14] (::Chain{…})(x::Vector{…}, ps::ComponentVector{…}, st::@NamedTuple{…})
    @ Lux C:\Users\accou\.julia\packages\Lux\7UzHr\src\layers\containers.jl:476
 [15] ude_dynamics!(du::Vector{…}, u::Vector{…}, p::ComponentVector{…}, t::Float64, p_true::Vector{…})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:124
 [16] nn_dynamics!(du::Vector{Float64}, u::Vector{Float64}, p::ComponentVector{Float64, Vector{…}, Tuple{…}}, t::Float64)
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:134
 [17] ODEFunction
    @ C:\Users\accou\.julia\dev\SciMLBase\src\scimlfunctions.jl:2296 [inlined]
 [18] _ode_addsteps!(k::Vector{…}, t::Float64, uprev::Vector{…}, u::Vector{…}, dt::Float64, f::ODEFunction{…}, p::ComponentVector{…}, cache::OrdinaryDiffEq.Rosenbrock23Cache{…}, always_calc_begin::Bool, allow_calc_end::Bool, force_calc_end::Bool)
    @ OrdinaryDiffEq C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\dense\stiff_addsteps.jl:66
 [19] ode_addsteps!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\dense\generic_dense.jl:124 [inlined]
 [20] ode_addsteps!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\dense\generic_dense.jl:61 [inlined]
 [21] reeval_internals_due_to_modification!(integrator::OrdinaryDiffEq.ODEIntegrator{…}, continuous_modification::Bool)
    @ OrdinaryDiffEq C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_interface.jl:51
 [22] reeval_internals_due_to_modification!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_interface.jl:39 [inlined]
 [23] apply_callback!(integrator::OrdinaryDiffEq.ODEIntegrator{…}, callback::ContinuousCallback{…}, cb_time::Float64, prev_sign::Int64, event_idx::Int64)
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\callbacks.jl:591
 [24] macro expansion
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:313 [inlined]
 [25] apply_ith_callback!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:299 [inlined]
 [26] handle_callbacks!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:338 [inlined]
 [27] _loopfooter!(integrator::OrdinaryDiffEq.ODEIntegrator{…})
    @ OrdinaryDiffEq C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:254
 [28] loopfooter!
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\integrators\integrator_utils.jl:207 [inlined]
 [29] solve!(integrator::OrdinaryDiffEq.ODEIntegrator{…})
    @ OrdinaryDiffEq C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\solve.jl:558
 [30] #__solve#670
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\solve.jl:7 [inlined]
 [31] __solve
    @ C:\Users\accou\.julia\packages\OrdinaryDiffEq\tO8iY\src\solve.jl:1 [inlined]
 [32] solve_call(_prob::ODEProblem{…}, args::Rosenbrock23{…}; merge_callbacks::Bool, kwargshandle::Nothing, kwargs::@Kwargs{…})
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:612
 [33] solve_up(prob::ODEProblem{…}, sensealg::Type, u0::Vector{…}, p::ComponentVector{…}, args::Rosenbrock23{…}; kwargs::@Kwargs{…})
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:1080
 [34] solve_up
    @ C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:1066 [inlined]
 [35] solve(prob::ODEProblem{…}, args::Rosenbrock23{…}; sensealg::Type, u0::Nothing, p::Nothing, wrap::Val{…}, kwargs::@Kwargs{…})
    @ DiffEqBase C:\Users\accou\.julia\dev\DiffEqBase\src\solve.jl:1003
 [36] predict!(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}}, X::Vector{Float64}, T::Vector{Float64})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:146
 [37] predict!(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:143
 [38] loss(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
    @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:155
 [39] top-level scope
    @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

When the shrink occurs the neural network is not geared to handle the size change. Then:

julia> loss(ComponentVector{Float64}(p))
size(X) = (11,)
You're trying to make a prediction for the UDE...
You've entered the callback
You've entered the callback
You've entered the callback
You've entered the callback
You've entered the callback
ERROR: DimensionMismatch: number of rows of each array must match (got [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6])
Stacktrace:
 [1] _typed_hcat(::Type{Float64}, A::Vector{Vector{Float64}})
   @ Base .\abstractarray.jl:1667
 [2] reduce(::typeof(hcat), A::Vector{Vector{Float64}})
   @ Base .\abstractarray.jl:1721
 [3] Array(VA::ODESolution{…})
   @ RecursiveArrayTools C:\Users\accou\.julia\packages\RecursiveArrayTools\VAOZn\src\vector_of_array.jl:81
 [4] predict!(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}}, X::Vector{Float64}, T::Vector{Float64})
   @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:146
 [5] predict!(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
   @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:143
 [6] loss(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
   @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:155
 [7] top-level scope
   @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

Your loss function is not well-defined.

function predict!(θ, X = Xₙ[:,1], T = t)
    #_prob = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, X, (T[1], T[end]), θ)
    @show size(X) # This is now the correct size!
    println("You're trying to make a prediction for the UDE...")
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    Array(solve(_prob, Rosenbrock23(linsolve = LUFactorization()), saveat = T,
                abstol=1e-6, reltol=1e-6, sensealg=SciMLSensitivity.TrackerAdjoint, callback=disappearing_callback))
    println("You've solved the UDE")
end

You can’t make a matrix if the outputs don’t all have the same size. You’ll have to leave it in the array-of-array form.

Then:

julia> loss(ComponentVector{Float64}(p))
size(X) = (11,)
You're trying to make a prediction for the UDE...
You've entered the callback
You've entered the callback
You've entered the callback
You've entered the callback
You've entered the callback
You've solved the UDE
ERROR: MethodError: no method matching -(::Float64, ::Nothing)

Closest candidates are:
  -(::AbstractFloat, ::ReverseDiff.TrackedReal)
   @ ReverseDiff C:\Users\accou\.julia\packages\ReverseDiff\p1MzG\src\derivatives\scalars.jl:18
  -(::AbstractFloat, ::ForwardDiff.Dual{Ty}) where Ty
   @ ForwardDiff C:\Users\accou\.julia\packages\ForwardDiff\PcZ48\src\dual.jl:145
  -(::Number, ::Num)
   @ Symbolics C:\Users\accou\.julia\packages\SymbolicUtils\0opve\src\methods.jl:76
  ...

Stacktrace:
 [1] _broadcast_getindex_evalf
   @ .\broadcast.jl:709 [inlined]
 [2] _broadcast_getindex
   @ .\broadcast.jl:682 [inlined]
 [3] getindex
   @ .\broadcast.jl:636 [inlined]
 [4] copy
   @ .\broadcast.jl:942 [inlined]
 [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{…}, Nothing, typeof(-), Tuple{…}})
   @ Base.Broadcast .\broadcast.jl:903
 [6] loss(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
   @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:156
 [7] top-level scope
   @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

Now it solves but predict! returns nothing. This is because you put the printing statement after the loss, which makes me know you never ran your loss function because there could be no version of this that works with that print statement there. When debugging an optimization, always make sure the loss runs first.

So then with:

function predict!(θ, X = Xₙ[:,1], T = t)
    #_prob = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, X, (T[1], T[end]), θ)
    @show size(X) # This is now the correct size!
    println("You're trying to make a prediction for the UDE...")
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    sol = solve(_prob, Rosenbrock23(linsolve = LUFactorization()), saveat = T,
                abstol=1e-6, reltol=1e-6, sensealg=SciMLSensitivity.TrackerAdjoint, callback=disappearing_callback)
    println("You've solved the UDE")
    sol
end

You’re now solving and predicting. But you get:

ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 94 and 104
Stacktrace:
 [1] _bcs1
   @ .\broadcast.jl:555 [inlined]
 [2] _bcs (repeats 2 times)
   @ .\broadcast.jl:549 [inlined]
 [3] broadcast_shape
   @ .\broadcast.jl:543 [inlined]
 [4] combine_axes
   @ .\broadcast.jl:524 [inlined]
 [5] instantiate
   @ .\broadcast.jl:306 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{…}, Nothing, typeof(-), Tuple{…}})
   @ Base.Broadcast .\broadcast.jl:903
 [7] loss(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
   @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:157
 [8] top-level scope
   @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

because

function loss(θ)
    X̂ = predict!(θ)
    sum(abs2, Xₙ .- X̂)
end

those aren’t the same size. Xₙ is a matrix where everything is size 11, and is variable sized. How do you want to interpret the loss on the pieces where it exists in one but not the other? You need to come up with some interpretation. That’s a modeling decision. I’m going to go with a naive approach that pads X̂ with zeros to build a matrix. So then:

function loss(θ)
    X̂ = predict!(θ)
    _X̂ = stack([[i < length(u) ? u[i] : 0.0 for i in 1:num_particles] for u in X̂.u])
    @show size(Xₙ), size(_X̂)
    sum(abs2, Xₙ .- _X̂)
end

loss(ComponentVector{Float64}(p))

gives:

ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 94 and 104
Stacktrace:
 [1] _bcs1
   @ .\broadcast.jl:555 [inlined]
 [2] _bcs (repeats 2 times)
   @ .\broadcast.jl:549 [inlined]
 [3] broadcast_shape
   @ .\broadcast.jl:543 [inlined]
 [4] combine_axes
   @ .\broadcast.jl:524 [inlined]
 [5] instantiate
   @ .\broadcast.jl:306 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{…}, Nothing, typeof(-), Tuple{…}})
   @ Base.Broadcast .\broadcast.jl:903
 [7] loss(θ::ComponentVector{Float64, Vector{Float64}, Tuple{Axis{…}}})
   @ Main c:\Users\accou\OneDrive\Computer\Desktop\test.jl:159
 [8] top-level scope
   @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

The sizes aren’t aligned because of the callback saves. See:

function loss(θ)
    X̂ = predict!(θ)
    @show X̂.t
    _X̂ = stack([[i < length(u) ? u[i] : 0.0 for i in 1:num_particles] for u in X̂.u])
    @show size(Xₙ), size(_X̂)
    sum(abs2, Xₙ .- _X̂)
end
X̂.t = [0.0, 0.002019500840601446, 0.022214509246615905, 0.06903461249326248, 0.12739386874944178, 0.22422679984025923, 0.33401405282213575, 0.4849950700004353,  0.6493816687688097, 0.8528799594353222, 1.0711573129157788, 1.2370661548433513, 1.2370661548433513, 1.32567723057406, 1.4187954752192102, 1.4187954752192102, 1.5972809157415833, 1.9024180610120203, 2.2254487173508872, 2.459991098571045, 2.459991098571045, 2.5751880144474786, 2.934493691011939, 3.3028868428095857, 3.6712799946072323, 4.039673146404879, 4.414476908178793, 4.540896371291674, 4.675434436334397, 4.80997250137712, 4.944510566419844, 5.045203109912585, 5.108425818252657, 5.160609665076813, 5.203314031134029, 5.236076626600296, 5.26201242155545, 5.282212927468215, 5.294681338281667, 5.294681338281667, 5.298013030898949, 5.310285297436004, 5.319786055109955, 5.327093291159464, 5.332677204481394, 5.336910125548846, 5.340090777617613, 5.342456825239084, 5.344197394951449, 5.345462023849283, 5.346368058079318, 5.347005702858154, 5.347447381268435, 5.347748496189048, 5.347950031757117, 5.348082238433514, 5.348167253004121, 5.348220860789177, 5.348253997385042, 5.348274085213068, 5.348286057354167, 5.3482930931475146, 5.348297182029458, 5.348299538484235, 5.34830089244451, 5.3483016737058255, 5.348302126297016, 5.348302387518702, 5.348302542571743, 5.348302642774769, 5.3483026923175805, 5.348302716879212, 5.348302773601575, 5.348302778834187, 5.348302787497093, 5.348302787497093, 5.348302796159999, 5.348302882789059, 5.34830374907966, 5.348312411985671, 5.3483990410457825, 5.349265331646897, 5.357928237658048, 5.401797097352301, 5.445665957046553, 5.535602901337889, 5.625539845629224, 5.764213683566983, 5.903367714832322, 6.088517600312102, 6.276967674080728, 6.5070690337645845, 6.744996066118993, 7.021410213526521, 7.279537127539122, 7.279537127539122, 7.311357889517913, 7.638969647057365, 7.986988909537768, 8.374695347554638, 8.791104133059292, 9.251879248066036, 9.749730287894117, 10.0]

You have extra saves at callback times. This is for the generation of the dense information. As documented, you probably want save_positions = (false,false).

disappearing_callback = ContinuousCallback(condition, affect!, save_positions = (false,false))

That gives the code:

using DifferentialEquations
using SciMLSensitivity
using Random
using Distributions
using Plots
using LinearSolve
rng = Random.default_rng()
Random.seed!(1010)

using OrdinaryDiffEq, ModelingToolkit, LinearAlgebra, ComponentArrays, Optimization, OptimizationOptimisers, OptimizationOptimJL, Lux,
ComponentArrays, DiffEqFlux, JLD2, FileIO, Statistics

# ODE Function
function particle_radius_change_dynamics(du, u, p, t)
    rate_constants = p
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i] - 1/mean_size)
    end
    return nothing
end

# Callback condition for removing particles when radius is at/close to 0
function condition(u, t, integrator)
    # Trigger the event if the order of magnitude of the radius goes below the threshold
    minimum(u) < 1e-4 ? 0 : 1
end

# Callback to modify the state vector if above condition is met
function affect!(integrator)
    println("You've entered the callback")
    original_size = length(integrator.u)
    idxs = findall(r -> r <= 1e-4, integrator.u)
    new_size = original_size - length(idxs)

    # Remove the identified radii that have effectively gone to zero
    deleteat!(integrator.u, idxs)
    deleteat_non_user_cache!(integrator, idxs)

    resize!(integrator, new_size)
    resize_non_user_cache!(integrator, new_size)
    nothing
end

# === PROBLEM SETUP ===
# Number of particles, and the mean size and standard deviation of particle sizes
num_particles = 11
mean_size = 1.2
std = 0.1 * mean_size

# Generate random initial particle size distribution
initial_radii = rand(Normal(mean_size, std), num_particles)

# Generate random rate rate constants for each particle (growing or shrinking -- usually would follow conservation laws but this is just an example)
random_rates = rand(Uniform(-1,1), num_particles)

# Make input vectors
u0 = initial_radii
p_true = random_rates
tspan = (0, 10)

# Make ODE problem
prob = ODEProblem(particle_radius_change_dynamics, u0, tspan, p_true)

# Callback to remove a particle from the simulation if it effectively "disappears" due to shrinking
disappearing_callback = ContinuousCallback(condition, affect!, save_positions = (false,false))

# === SOLVE ===
@time sol = solve(prob, Rosenbrock23(linsolve = LUFactorization()),
maxiters=1e6, abstol=1e-5, reltol=1e-5, sensealg=SciMLSensitivity.TrackerAdjoint, isoutofdomain=(u,p,t)->any(x->x<0.0, u), callback=disappearing_callback)

println("You've solved the ODE")
# See number of particles decrease over time
display(plot(sol.t, map((x) -> length(x), sol[:]), lw = 3,
    ylabel = "Number of Nanoparticles", xlabel = "Time"))

# ==== TEST THE UDE SET UP
new_u = Array(sol.u)

# == Need to fill in the missing data with NaN so that its a matrix we can plot/generate data
function pad_vectors_with_NaN(vectors::Vector{Vector{Float64}}, M::Int)
    N = length(vectors[1])
    matrix = fill(NaN, N, M)

    for i in 1:N
        length_vec = min(length(vectors[i]), M)
        matrix[i, 1:length_vec] = vectors[i][1:length_vec]
    end

    return matrix
end

X = pad_vectors_with_NaN(new_u, length(sol.t))
t = sol.t

relative_factor = 1e-4
magnitude = relative_factor * abs.(X)
noise = rand(Normal(0,1), size(X))
scaled_noise = magnitude .* noise
Xₙ = X .+ scaled_noise

plt = plot(t, X[1,:], alpha = 0.75, color = :black, label = ["True Data" nothing], title="Single trajectory")
scatter!(plt, t, Xₙ[1,:], color = :red, label = ["Noisy Data" nothing], markersize=2)
display(plt)

## Define the network
# Gaussian RBF as activation
rbf(x) = exp.(-(x.^2))

# Multilayer FeedForward
U = Lux.Chain(
    Lux.Dense(num_particles,5,rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,num_particles)
)
# Get the initial parameters and state variables of the model
p, st = Lux.setup(rng, U)

# Define the hybrid model
function ude_dynamics!(du,u, p, t, p_true)
    _u = [i < length(u) ? u[i] : 0.0 for i in 1:num_particles]
    û = U(_u, p, st)[1] # Network prediction
    rate_constants = p_true
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i]) + û[i]
    end
    return nothing
end

# Closure with the known parameter
nn_dynamics!(du,u,p,t) = ude_dynamics!(du,u,p,t,p_true)
# Define the problem
disappearing_callback = ContinuousCallback(condition, affect!, save_positions = (false,false))
prob_nn = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, Xₙ[:, 1], tspan, p)

## Function to train the network
# Define a predictor
function predict!(θ, X = Xₙ[:,1], T = t)
    #_prob = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, X, (T[1], T[end]), θ)
    println("You're trying to make a prediction for the UDE...")
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    sol = solve(_prob, Rosenbrock23(linsolve = LUFactorization()), saveat = T,
                abstol=1e-6, reltol=1e-6, sensealg=SciMLSensitivity.TrackerAdjoint, callback=disappearing_callback)
    println("You've solved the UDE")
    sol
end

U(rand(11), ComponentVector{Float64}(p), st)

# Simple L2 loss
function loss(θ)
    X̂ = predict!(θ)
    _X̂ = stack([[i < length(u) ? u[i] : 0.0 for i in 1:num_particles] for u in X̂.u])
    sum(abs2, Xₙ .- _X̂)
end

# Container to track the losses
losses = Float64[]

callback = function (p, l)
  push!(losses, l)
  if length(losses)%50==0
      println("Current loss after $(length(losses)) iterations: $(losses[end])")
  end
  return false
end
loss(ComponentVector{Float64}(p))
julia> loss(ComponentVector{Float64}(p))
size(X) = (11,)
You're trying to make a prediction for the UDE...
You've solved the UDE
NaN

So okay your loss is NaN but it works now.

julia> Xₙ
11×93 Matrix{Float64}:
 1.19798  1.09046  1.3215   1.14826  1.22962  1.27439  0.976571  1.2117   …  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 1.19837  1.09057  1.32141  1.14833  1.22956  1.27438  0.976407  1.21151     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 1.19835  1.08992  1.32098  1.14862  1.22911  1.27431  0.977636  1.21163     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 1.19821  1.08835  1.3197   1.14978  1.22868  1.274    0.980038  1.21188     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 ⋮                                            ⋮                           ⋱              ⋮                        ⋮                        ⋮
 1.19835  1.06615  1.30248  1.15901  1.2213   1.27141  1.00889   1.21355     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 1.19818  1.05714  1.29688  1.16196  1.21924  1.27041  1.01801   1.21413     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
 1.19828  1.04716  1.29134  1.16486  1.21739  1.26963  1.02717   1.2148   …  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

It’s a NaN because your data has NaNs in there, and one NaN would make it L2 to a NaN. I’d be happy to get on a call with you and learn more about the application to consider what the right modeling choice might be here, but this NaN will make optimization not work well.

So I made a modeling decision of a zero fill, which may not be a good idea because that means the solution of always eliminating trivially matches the data, but with that assumption I at least get something that runs:

julia> loss(ComponentVector{Float64}(p))
You're trying to make a prediction for the UDE...
You've entered the callback
You've entered the callback
You've entered the callback
You've solved the UDE
4926.887024858428

So there, we have a loss function that is called and gives a number.

So now we start differentiating. Okay, first things first, fix the typo I mentioned:

    sol = solve(_prob, Rosenbrock23(linsolve = LUFactorization()), saveat = T,
                abstol=1e-6, reltol=1e-6, sensealg=SciMLSensitivity.TrackerAdjoint(), callback=disappearing_callback)

I’ll make that throw a better error message later today. Next I got an error about types so I need to be more careful about my definition of zero:

_u = [i < length(u) ? u[i] : zero(eltype(u)) for i in 1:num_particles]

Next I played with a bunch of things. It seems getting the pullback of arbitrary sizes is tricky, but timing it without the pullback and it seems this is actually a better fit for forward mode anyways. Also, your equation is non-stiff. So I just simplify a bit:

function predict!(θ, X = Xₙ[:,1], T = t)
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    sol = solve(_prob, Tsit5(), saveat = T,
                abstol=1e-6, reltol=1e-6, callback=disappearing_callback)
    sol
end

adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x,p)->loss(x), adtype)
optprob = Optimization.OptimizationProblem(optf, ComponentVector{Float64}(p))
res1 = Optimization.solve(optprob, ADAM(0.1), callback=callback, maxiters = 5)

And I’m at the character limit.

Conclusion

This gives a final code of:

using DifferentialEquations
using SciMLSensitivity
using Random
using Distributions
using Plots
using LinearSolve
rng = Random.default_rng()
Random.seed!(1010)

using OrdinaryDiffEq, ModelingToolkit, LinearAlgebra, ComponentArrays, Optimization, OptimizationOptimisers, OptimizationOptimJL, Lux,
ComponentArrays, DiffEqFlux, JLD2, FileIO, Statistics

# ODE Function
function particle_radius_change_dynamics(du, u, p, t)
    rate_constants = p
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i] - 1/mean_size)
    end
    return nothing
end

# Callback condition for removing particles when radius is at/close to 0
function condition(u, t, integrator)
    # Trigger the event if the order of magnitude of the radius goes below the threshold
    minimum(u) < 1e-4 ? 0 : 1
end

# Callback to modify the state vector if above condition is met
function affect!(integrator)
    original_size = length(integrator.u)
    idxs = findall(r -> r <= 1e-4, integrator.u)
    new_size = original_size - length(idxs)

    # Remove the identified radii that have effectively gone to zero
    deleteat!(integrator.u, idxs)
    deleteat_non_user_cache!(integrator, idxs)

    resize!(integrator, new_size)
    resize_non_user_cache!(integrator, new_size)
    nothing
end

# === PROBLEM SETUP ===
# Number of particles, and the mean size and standard deviation of particle sizes
num_particles = 11
mean_size = 1.2
std = 0.1 * mean_size

# Generate random initial particle size distribution
initial_radii = rand(Normal(mean_size, std), num_particles)

# Generate random rate rate constants for each particle (growing or shrinking -- usually would follow conservation laws but this is just an example)
random_rates = rand(Uniform(-1,1), num_particles)

# Make input vectors
u0 = initial_radii
p_true = random_rates
tspan = (0, 10)

# Make ODE problem
prob = ODEProblem(particle_radius_change_dynamics, u0, tspan, p_true)

# Callback to remove a particle from the simulation if it effectively "disappears" due to shrinking
disappearing_callback = ContinuousCallback(condition, affect!, save_positions = (false,false))

# === SOLVE ===
@time sol = solve(prob, Rosenbrock23(linsolve = LUFactorization()),
maxiters=1e6, abstol=1e-5, reltol=1e-5, sensealg=SciMLSensitivity.TrackerAdjoint, isoutofdomain=(u,p,t)->any(x->x<0.0, u), callback=disappearing_callback)

println("You've solved the ODE")
# See number of particles decrease over time
display(plot(sol.t, map((x) -> length(x), sol[:]), lw = 3,
    ylabel = "Number of Nanoparticles", xlabel = "Time"))

# ==== TEST THE UDE SET UP
new_u = Array(sol.u)

# == Need to fill in the missing data with NaN so that its a matrix we can plot/generate data
function pad_vectors_with_NaN(vectors::Vector{Vector{Float64}}, M::Int)
    N = length(vectors[1])
    matrix = fill(0.0, N, M)

    for i in 1:N
        length_vec = min(length(vectors[i]), M)
        matrix[i, 1:length_vec] = vectors[i][1:length_vec]
    end

    return matrix
end

X = pad_vectors_with_NaN(new_u, length(sol.t))
t = sol.t

relative_factor = 1e-4
magnitude = relative_factor * abs.(X)
noise = rand(Normal(0,1), size(X))
scaled_noise = magnitude .* noise
Xₙ = X .+ scaled_noise

plt = plot(t, X[1,:], alpha = 0.75, color = :black, label = ["True Data" nothing], title="Single trajectory")
scatter!(plt, t, Xₙ[1,:], color = :red, label = ["Noisy Data" nothing], markersize=2)
display(plt)

## Define the network
# Gaussian RBF as activation
rbf(x) = exp.(-(x.^2))

# Multilayer FeedForward
U = Lux.Chain(
    Lux.Dense(num_particles,5,rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,5, rbf), Lux.Dense(5,num_particles)
)
# Get the initial parameters and state variables of the model
p, st = Lux.setup(rng, U)

# Define the hybrid model
function ude_dynamics!(du,u, p, t, p_true)
    _u = [i < length(u) ? u[i] : zero(eltype(u)) for i in 1:num_particles]
    û = U(_u, p, st)[1] # Network prediction
    rate_constants = p_true
    R = u
    for i in eachindex(R)
        du[i] = rate_constants[i] * (1.0/R[i]) + û[i]
    end
    return nothing
end

# Closure with the known parameter
nn_dynamics!(du,u,p,t) = ude_dynamics!(du,u,p,t,p_true)
# Define the problem
disappearing_callback = ContinuousCallback(condition, affect!, save_positions = (false,false))
prob_nn = ODEProblem{true, SciMLBase.FullSpecialize}(nn_dynamics!, Xₙ[:, 1], tspan, p)

## Function to train the network
# Define a predictor
function predict!(θ, X = Xₙ[:,1], T = t)
    _prob = remake(prob_nn, u0 = X, tspan = (T[1], T[end]), p = θ)
    sol = solve(_prob, Tsit5(), saveat = T,
                abstol=1e-6, reltol=1e-6, callback=disappearing_callback)
    sol
end

# Simple L2 loss
function loss(θ)
    X̂ = predict!(θ)
    _X̂ = stack([[i < length(u) ? u[i] : 0.0 for i in 1:num_particles] for u in X̂.u])
    sum(abs2, Xₙ .- _X̂)
end

# Container to track the losses
losses = Float64[]

callback = function (p, l)
  push!(losses, l)
  if length(losses)%10==0
      println("Current loss after $(length(losses)) iterations: $(losses[end])")
  end
  return false
end
loss(ComponentVector{Float64}(p))

## Training
# First train with ADAM for better convergence -> move the parameters into a
# favourable starting positing for BFGS
adtype = Optimization.AutoForwardDiff()
optf = Optimization.OptimizationFunction((x,p)->loss(x), adtype)
optprob = Optimization.OptimizationProblem(optf, ComponentVector{Float64}(p))
res1 = Optimization.solve(optprob, ADAM(0.1), callback=callback, maxiters = 100)
println("Training loss after $(length(losses)) iterations: $(losses[end])")

which starts at a loss of around 5000 and then gives:

Current loss after 10 iterations: 38.69872304851296
Current loss after 20 iterations: 34.19523351127281
Current loss after 30 iterations: 30.236240434083374
Current loss after 40 iterations: 27.84174399900122
Current loss after 50 iterations: 26.907731455937093
Current loss after 60 iterations: 26.8356412337848
Current loss after 70 iterations: 25.742155061706747
Current loss after 80 iterations: 25.404703741918357
Current loss after 90 iterations: 24.746787864533566
Current loss after 100 iterations: 23.941083574744237

It runs in about a minute or two, can probably improve it but it’s not the worst. I would probably add a BFGS step after that. But there you go, a first version.

But again, we should talk about some of the modeling assumptions made in the loss functions and the padding choices.

Wow – thank you so much, I was not expecting such a thorough solution/resolution…! I really appreciate your time. I am new to this area so the it really helps to have an idea of how these objects can be debugged. And I’d be happy to chat specifics vis-a-vis modeling assumptions. I agree that the zero-fill may not be a good assumption for the reason you stated, but of course just getting something that works is always a good starting point…my email is natalie.isenberg@pnnl.gov if you want to reach me there to discuss further. Thanks again for your time and wisdom.