Trouble writing OptimizationFunction for automatic forward differentiation during Parameter Estimation of an ODEProblem

I am trying to learn Julia for its potential use in parameter estimation. I am interested in estimating kinetic parameters of chemical reactions, which usually involves optimizing reaction parameters with multiple independent batches of experiments. I have successfully optimized a single batch, but need to expand the problem to use many different batches. In developing a sample problem, I am trying to optimize using two toy batches. I know there are probably smarter ways to do this (subject of a future question), but my current workflow involves calling an ODEProblem for each batch, calculating its loss against the data, and minimizing the sum of the residuals for the two batches. Unfortunately, I get an error when initiating the optimization with Optimization.jl. The current code and error are shown below:

using DifferentialEquations, Plots, DiffEqParamEstim
using Optimization, ForwardDiff, OptimizationOptimJL, OptimizationNLopt
using Ipopt, OptimizationGCMAES, Optimisers
using Random

#Experimental data, species B is NOT observed in the data
times  = [0.0, 0.071875, 0.143750, 0.215625, 0.287500, 0.359375, 0.431250,
                      0.503125, 0.575000, 0.646875, 0.718750, 0.790625, 0.862500,
                      0.934375, 1.006250, 1.078125, 1.150000]
A_obs = [1.0, 0.552208, 0.300598, 0.196879, 0.101175, 0.065684, 0.045096,
                      0.028880, 0.018433, 0.011509, 0.006215, 0.004278, 0.002698,
                      0.001944, 0.001116, 0.000732, 0.000426]
C_obs = [0.0, 0.187768, 0.262406, 0.350412, 0.325110, 0.367181, 0.348264,
                      0.325085, 0.355673, 0.361805, 0.363117, 0.327266, 0.330211,
                      0.385798, 0.358132, 0.380497, 0.383051]
P_obs = [0.0, 0.117684, 0.175074, 0.236679, 0.234442, 0.270303, 0.272637,
                      0.274075, 0.278981, 0.297151, 0.297797, 0.298722, 0.326645,
                      0.303198, 0.277822, 0.284194, 0.301471]

                      
#Create additional data sets for a multi data set optimization
#Simple noise added to data for testing
times_2 = times[2:end] .+ rand(range(-0.05,0.05,100))
P_obs_2 = P_obs[2:end] .+ rand(range(-0.05,0.05,100))
A_obs_2 = A_obs[2:end].+ rand(range(-0.05,0.05,100))
C_obs_2 = C_obs[2:end].+ rand(range(-0.05,0.05,100))


#ki = [2.78E+00, 1.00E-09, 1.97E-01, 3.04E+00, 2.15E+00, 5.27E-01] #Target optimized parameters
ki = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1] #Initial guess of parameters
IC = [1.0, 0.0, 0.0, 0.0] #Initial condition for each species
tspan1 = (minimum(times),maximum(times)) #tuple timespan of data set 1
tspan2 = (minimum(times_2),maximum(times_2)) #tuple timespan of data set 2

# data =  VectorOfArray([A_obs,C_obs,P_obs])'
data  =  vcat(A_obs',C_obs',P_obs') #Make multidimensional array containing all observed data for dataset1, transpose to match shape of ODEProblem output
data2 = vcat(A_obs_2',C_obs_2',P_obs_2') #Make multidimensional array containing all observed data for dataset2, transpose to match shape of ODEProblem output

#make dictionary containing data, time, and initial conditions
keys1 = ["A","B"]
keys2 = ["time","obs","IC"]
entryA =[times,data,IC]
entryB = [times_2, data2,IC]
nest=[Dict(zip(keys2,entryA)),Dict(zip(keys2,entryB))]
exp_dict = Dict(zip(keys1,nest)) #data dictionary

#rate equations in power law form r = k [A][B]
function rxn(x, k) 
    A = x[1]
    B = x[2]
    C = x[3]
    P = x[4]
    
    k1 = k[1]
    k2 = k[2]
    k3 = k[3]
    k4 = k[4]
    k5 = k[5]
    k6 = k[6]
    
    r1 = k1 * A
    r2 = k2 * A * B
    r3 = k3 * C * B
    r4 = k4 * A
    r5 = k5 * A
    r6 = k6 * A * B
    
    return [r1, r2, r3, r4, r5, r6] #returns reaction rate of each equation
end
#Mass balance differential equations
function mass_balances(di,x,args,t) 
        k = args
        r = rxn(x, k)
        di[1] = - r[1] - r[2] - r[4] - r[5] - r[6] #Species A
        di[2] = + r[1] - r[2] - r[3] - r[6]        #Species B
        di[3] = + r[2] - r[3] + r[4]               #Species C
        di[4] = + r[3] + r[5] + r[6]               #Species P

end

function ODESols(time,uo,parms)
    time_init = (minimum(time),maximum(time))
    prob = ODEProblem(mass_balances,uo,time_init,parms)
    sol = solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8,save_idxs = [1,3,4],saveat=time) #Integrate prob
    return sol
end

function cost_function(data_dict,parms)
    res_dict = Dict(zip(keys(data_dict),[0.0,0.0]))
    for key in keys(data_dict)
        pred = ODESols(data_dict[key]["time"],data_dict[key]["IC"],parms)
        loss = L2Loss(data_dict[key]["time"],data_dict[key]["obs"])
        err = loss(pred)
        res_dict[key] = err
    end 
    residual = sum(res_dict[key] for key in keys(res_dict))
    @show typeof(residual)
    return residual
end

lb = [0.0,0.0,0.0,0.0,0.0,0.0]         #parameter lower bounds
ub = [10.0,10.0,10.0,10.0,10.0,10.0]   #parameter upper bounds

optfun = Optimization.OptimizationFunction(cost_function,Optimization.AutoForwardDiff())
optprob = Optimization.OptimizationProblem(optfun,exp_dict, ki,lb=lb,ub=ub,reltol=1E-8) #Set up optimization problem
optsol=solve(optprob, BFGS(),maxiters=10000) #Solve optimization problem

println(optsol.u) #print solution

when I call optsol I get the error:

ERROR: MethodError: no method matching ForwardDiff.GradientConfig(::Optimization.var"#89#106"{OptimizationFunction{true, Optimization.AutoForwardDiff{nothing}, typeof(cost_function), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Vector{Float64}}, ::Dict{String, Dict{String, Array{Float64}}}, ::ForwardDiff.Chunk{2})

Searching online suggests that the issue may be that my cost_function function is not generic enough for ForwardDiff to handle, however I am not sure how to identify where the issue is in this function, or whether it is related to the functions (mass_balances and rxn) that are called within cost_function. Another potential issue is that I am not calling the functions appropriately when building the OptimizationFunction or the OptimizationProblem, but I cannot identify the issue here either.

Thank you for any suggestions and your help in troubleshooting this application!

The initial problem here is that the arguments to cost_function are reversed from the way Optimization expects. The parameters to be varied should be first, then any hyperparameters or data. The same error is present in the call to the creation of the OptimizationProblem.

There is a misspelling on several lines parms instead of params that gives a few undefined variable errors.

The final problem is a classic ForwardDiff difficulty, the storage allocated in the first line of cost_function must be created with the same type as params to allow storage of ForwardDiff.Dual values.

Here is a script that runs, though I cannot vouch for its correctness.

using DifferentialEquations, Plots, DiffEqParamEstim
using Optimization, ForwardDiff, OptimizationOptimJL, OptimizationNLopt
using Ipopt, OptimizationGCMAES, Optimisers
using Random

#Experimental data, species B is NOT observed in the data
times  = [0.0, 0.071875, 0.143750, 0.215625, 0.287500, 0.359375, 0.431250,
                      0.503125, 0.575000, 0.646875, 0.718750, 0.790625, 0.862500,
                      0.934375, 1.006250, 1.078125, 1.150000]
A_obs = [1.0, 0.552208, 0.300598, 0.196879, 0.101175, 0.065684, 0.045096,
                      0.028880, 0.018433, 0.011509, 0.006215, 0.004278, 0.002698,
                      0.001944, 0.001116, 0.000732, 0.000426]
C_obs = [0.0, 0.187768, 0.262406, 0.350412, 0.325110, 0.367181, 0.348264,
                      0.325085, 0.355673, 0.361805, 0.363117, 0.327266, 0.330211,
                      0.385798, 0.358132, 0.380497, 0.383051]
P_obs = [0.0, 0.117684, 0.175074, 0.236679, 0.234442, 0.270303, 0.272637,
                      0.274075, 0.278981, 0.297151, 0.297797, 0.298722, 0.326645,
                      0.303198, 0.277822, 0.284194, 0.301471]


#Create additional data sets for a multi data set optimization
#Simple noise added to data for testing
times_2 = times[2:end] .+ rand(range(-0.05,0.05,100))
P_obs_2 = P_obs[2:end] .+ rand(range(-0.05,0.05,100))
A_obs_2 = A_obs[2:end].+ rand(range(-0.05,0.05,100))
C_obs_2 = C_obs[2:end].+ rand(range(-0.05,0.05,100))


#ki = [2.78E+00, 1.00E-09, 1.97E-01, 3.04E+00, 2.15E+00, 5.27E-01] #Target optimized parameters
ki = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1] #Initial guess of parameters
IC = [1.0, 0.0, 0.0, 0.0] #Initial condition for each species
# tspan1 = (minimum(times),maximum(times)) #tuple timespan of data set 1
# tspan2 = (minimum(times_2),maximum(times_2)) #tuple timespan of data set 2

# data =  VectorOfArray([A_obs,C_obs,P_obs])'
data  =  vcat(A_obs',C_obs',P_obs') #Make multidimensional array containing all observed data for dataset1, transpose to match shape of ODEProblem output
data2 = vcat(A_obs_2',C_obs_2',P_obs_2') #Make multidimensional array containing all observed data for dataset2, transpose to match shape of ODEProblem output

#make dictionary containing data, time, and initial conditions
keys1 = ["A","B"]
keys2 = ["time","obs","IC"]
entryA =[times,data,IC]
entryB = [times_2, data2,IC]
nest=[Dict(zip(keys2,entryA)),Dict(zip(keys2,entryB))]
exp_dict = Dict(zip(keys1,nest)) #data dictionary

#rate equations in power law form r = k [A][B]
function rxn(x, k)
    A = x[1]
    B = x[2]
    C = x[3]
    P = x[4]

    k1 = k[1]
    k2 = k[2]
    k3 = k[3]
    k4 = k[4]
    k5 = k[5]
    k6 = k[6]

    r1 = k1 * A
    r2 = k2 * A * B
    r3 = k3 * C * B
    r4 = k4 * A
    r5 = k5 * A
    r6 = k6 * A * B

    return [r1, r2, r3, r4, r5, r6] #returns reaction rate of each equation
end
#Mass balance differential equations
function mass_balances(di,x,args,t)
        k = args
        r = rxn(x, k)
        di[1] = - r[1] - r[2] - r[4] - r[5] - r[6] #Species A
        di[2] = + r[1] - r[2] - r[3] - r[6]        #Species B
        di[3] = + r[2] - r[3] + r[4]               #Species C
        di[4] = + r[3] + r[5] + r[6]               #Species P

end

# Correct spelling here
function ODESols(time,uo,params)
    time_init = (minimum(time),maximum(time))
    prob = ODEProblem(mass_balances,uo,time_init,params)
    sol = solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8,save_idxs = [1,3,4],saveat=time) #Integrate prob
    return sol
end

# swap arguments
function cost_function(params, data_dict)
    # Correct storage allocation here
    res_dict = Dict(zip(keys(data_dict),similar(params, 2)))
    for key in keys(data_dict)
        pred = ODESols(data_dict[key]["time"],data_dict[key]["IC"],params)
        loss = L2Loss(data_dict[key]["time"],data_dict[key]["obs"])
        err = loss(pred)
        res_dict[key] = err
    end
    residual = sum(res_dict[key] for key in keys(res_dict))
    @show typeof(residual)
    return residual
end

function main(exp_dict, ki)
    lb = [0.0,0.0,0.0,0.0,0.0,0.0]         #parameter lower bounds
    ub = [10.0,10.0,10.0,10.0,10.0,10.0]   #parameter upper bounds

    optfun = Optimization.OptimizationFunction(cost_function,Optimization.AutoForwardDiff())
    # Swap argument order here too
    optprob = Optimization.OptimizationProblem(optfun, ki, exp_dict; lb=lb,ub=ub,reltol=1E-8) #Set up optimization problem
    optsol=solve(optprob, BFGS(),maxiters=10000) #Solve optimization problem

end

Thank you very much for your feedback! The problem runs with your suggestions. I was confused regarding the order of the function arguments, as the documentation for OptimizationProblems shows:

OptimizationProblem{iip}(f, x, p = SciMLBase.NullParameters(),;
                        lb = nothing,
                        ub = nothing,
                        lcons = nothing,
                        ucons = nothing,
                        sense = nothing,
                        kwargs...)

which I interpreted as requiring parameters as the last argument before keyword arguments.