DiffEqFlux / diffeq_adjoint: no method matching back!(::Float64)

Hi all, I’m new to julia and ML, so maybe it’s something obvious, but I’m a bit lost now.

I would like to optimize a large set of parameters (N ODE’s with N > 2k) to fit to data. From reading the documentation of DiffEqFlux, it seems that adjoint sensitivity would be the most appropiate way. However, the back-propagation doesn’t seem to work (“no method matching back!(::Float64)”). Maybe the ‘tracked’ got lost somewhere? The problem is similar when using diffeq_rd(), where I get: “Not implemented: convert tracked Tracker.TrackedReal{Float64} to tracked Float64” and can’t resolve that with using collect() in the ODEs (as suggested in other threads).

Here’s an example for diffeq_adjoint, with the error on the last line.

using Random, Distributions, DifferentialEquations, Flux, DiffEqFlux

function duL(du,u,p,t)
    du[1:N] = p[1:N] .* (u[N+1] .- u[1:N])
    du[N+1] = p[N+1] .* (1-u[N+1]) .+ p[N+2] .* sum(p[1:N].*d[1:N].*(u[1:N].-u[N+1]))
end

N = 10  # goal: N = 3000-7000
Random.seed!(123)

p           = zeros(N+2);                   # init parameter array
p[1:N]      = 1 ./ (rand(Gamma(3,3),N))     # distribution of individual rates
p[N+1:end]  = [.5 ; 8.0]                    # 2 global parameters
u0          = zeros(N+1,1);                 # inital state
const d     = ones(N,1) ./ N                # weights, fixed (experimental data)
tspan       = (0.0,21.0)                    # total time range
te          = [5.0,14.0,21.0]               # times for experimental data

prob = ODEProblem(duL,u0,tspan,p)
sol = solve(prob,Tsit5(),saveat=te)

u_data = deepcopy(sol.u)                    # "experimental data" (with some noise)
for n=1:length(te)
    u_data[n] .+= 0.005 .* randn(size(u_data[n]))
end

# starting parameters near solution
p0 = p.* (1.0 .+ 0.1 .* randn(size(p)))
p0 = param(p0)

predict_adj() = collect(diffeq_adjoint(p0,prob,Tsit5(),saveat=te))
function loss_adj()
    loss_tot = 0.0
    for n=1:length(te)
        S = predict_adj()
        loss_tot += sum(abs2,u_data[n].-S[n])
    end
    loss_tot
end

params = Flux.Params([p0])
data = Iterators.repeated((), 100)
opt = ADAM(0.1)
cb() = display(loss_adj())

Flux.train!(loss_adj, params, data, opt, cb = cb)

DiffEqFlux, gradient decent, etc. really isn’t the best way to do parameter estimation/optimization. It’s just a good way to handle neural networks, which itself is a parameter estimation problem but with different minima characteristics. I would highly recommend taking a look at the parameter estimation page

http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html

as these methods, especially with global optimizers, will tend to perform much better on these kinds of problems (global optimizers can take awhile to run, but will give much better optima on these problems!).

Anyways, here’s a working example of parameter estimation of your model with Flux:

using Random, Distributions, DifferentialEquations, Flux, DiffEqFlux, DiffEqSensitivity

function duL(du,u,p,t)
    du[1:N] = p[1:N] .* (u[N+1] .- u[1:N])
    du[N+1] = p[N+1] .* (1-u[N+1]) .+ p[N+2] .* sum(p[1:N].*d[1:N].*(u[1:N].-u[N+1]))
end

N = 10  # goal: N = 3000-7000
Random.seed!(123)

p           = zeros(N+2);                   # init parameter array
p[1:N]      = 1 ./ (rand(Gamma(3,3),N))     # distribution of individual rates
p[N+1:end]  = [.5 ; 8.0]                    # 2 global parameters
u0          = zeros(N+1,1);                 # inital state
const d     = ones(N,1) ./ N                # weights, fixed (experimental data)
tspan       = (0.0,21.0)                    # total time range
te          = [5.0,14.0,21.0]               # times for experimental data

prob = ODEProblem(duL,u0,tspan,p)
sol = solve(prob,Tsit5(),saveat=te)

u_data = VectorOfArray(deepcopy(sol.u))                 # "experimental data" (with some noise)
for n=1:length(te)
    u_data[n] .+= 0.005 .* randn(size(u_data[n]))
end

# starting parameters near solution
p0 = p.* (1.0 .+ 0.1 .* randn(size(p)))
p0 = param(p0)
params = Flux.Params([p0])

predict_adj() = diffeq_adjoint(p0,prob,Tsit5(),saveat=te,
                               sensealg=DiffEqSensitivity.SensitivityAlg(
                               quad=false,autojacvec=false))


function loss_adj()
    sum(abs2,vec(u_data).-vec(predict_adj()[:,:,2:end]))
end

data = Iterators.repeated((), 1000)
opt = ADAM(0.1)
cb() = display(loss_adj())

loss_adj()
Flux.back!(loss_adj())
Flux.train!(loss_adj, params, data, opt, cb = cb)

Many thanks for the fast solution ChrisRackauckas! It works and arrives more or less at the solution after a couple thousand iterations (for N=100). I had to reduce the ADAM rate however as it tended to change the parameters too wildly exiting already good parameter regions.

And thanks for the suggestion to look up on the parameter estimation page - actually I started there (using build_loss_objective(), optimize(), BFGS()), but got the impression that this was rather slow (~1h for N=1000 and still no good convergence). I thought that handling many parameters was maybe more common for neural networks and more efficient algorithms were used there.

Any ideas what (global) optimizers are well-suited for my case, especially when fitting noisy data?

There was a related discussion a while ago:

1 Like