DifferentialEquations.jl + Flux.jl or Knet.jl

diffeq

#1

I am trying to estimate various dynamical systems using neural network models. I am very interested in using the integrators provided by DifferentialEquations.jl to do this, and the docs for that package even have nice example on parameter estimation . If possible, I would like to use DifferentialEquations with Flux.jl, but so far, I cannot seem to get any example working. Here is the code I am trying

using DifferentialEquations
using Flux
using Flux.Tracker


b = param(-1.0)
f(t, x) = b.*x

u0 = param(1.0)
prob = ODEProblem(f, u0, (0, 1.0))
sol = solve(prob)
# output:
# ERROR: MethodError: Cannot `convert` an object of type Array{Float64,0} to an object of type TrackedArray{…,Array{Float64,0}}
# This may have arisen from a call to the constructor TrackedArray{…,Array{Float64,0}}(...),
# since type constructors fall back to convert methods.

Ultimately, I want to compare to compute l = loss(sol.u, u_truth) and call back!(l) to compute the gradients with respect to the parameters using Flux. Is this possible in principal?


#2

In principle, yes. In practice, it depends on how much Autograd.jl (for KNet.jl) or Flux.jl support in their backpropagation algorithms. I think it would be easier to just use ForwardDiff.jl for the gradients (which we already know works) for now, and use that with SGD or ADAM.

But if you want to poke around on this, I suggest getting onto the Slack so we can work through this. It’ll likely take some new method definitions for the backprop stuff, but shouldn’t be too bad?


#3

Thanks for replying Chris.

FWIW I just tried the same thing in KNet and I get a similar error:

ERROR: MethodError: Cannot `convert` an object of type AutoGrad.Rec{Float64} to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
Stacktrace:

I think it would be easier to just use ForwardDiff.jl for the gradients

Unfortunately, I think backward derivatives are needed for computational reasons because neural networks have so many free parameters.

I would be interested in trying to get this to work, but I am pretty new to julia.


#4

You’re free to mix and match forward and backward, and different AD techniques generally, as you please. Assuming you have a neural network generating b which you then feed into a solver, you can do something like:

  • Run the forward pass of the network to generate b as a tracked vector
  • Unwrap b and create a dual number b + ϵ
  • Run the solver and calculate the loss to get loss + dloss/db*ϵ
  • Call back!(b, dloss/db) to have flux calculate dloss/dparam for each net parameter.

There’s going to be a bit of plumbing, but we can help with that. And if it’s working well we can find ways to make Flux/DiffEq integration smoother.


#5

What is your use-case?


#6

I am asking because I saw that you are also working with dynamic noise/stochastic differential equations, e.g. Ornstein-Uhlenbeck processes, which I am working on.


#7

Hi Chris,

Thanks for your great work on DifferentialEquations.jl! The possibility to get a gradient of the solution of an ODE is really mind-blowing :slight_smile:

You mention here that it takes 5 line of code to use DifferentialEquations.jl with Flux.jl. Would you mind sharing an example?


#8

I’m trying to build an adjoint model, which is really just reverse-mode AD. Specifically, I want to apply it to a simple handwritten PDE solver, and compare the gradients at each step to a hand-derived adjoint. But even for a simple Euler ODE type of model, I run into similar problems as originally discussed in this thread:

using Flux.Tracker
x = zeros(10); dt = 0.01; c = param(0.5)
function integrate(x, c, dt)
    for i in 1:length(x)-1
        x[i+1] = x[i] - c*x[i]*dt
    end
end

Just running this function gives “ERROR: MethodError: no method matching Float64(::Flux.Tracker.TrackedReal{Float64})”.

Is there a way to do this in Julia using pure reverse AD (and even evaluate the tape instruction by instruction) so I can compare to a hand-derived adjoint code? (Or is any such solution approach on the horizon?) Failing that, is there a more natural/transparent way of getting the gradients (without pure reverse mode) using current versions of AD, than the fairly manual “plumbing” approach suggested by @MikeInnes back in January?