DifferentialEquations.jl + Flux.jl or Knet.jl

diffeq

#1

I am trying to estimate various dynamical systems using neural network models. I am very interested in using the integrators provided by DifferentialEquations.jl to do this, and the docs for that package even have nice example on parameter estimation . If possible, I would like to use DifferentialEquations with Flux.jl, but so far, I cannot seem to get any example working. Here is the code I am trying

using DifferentialEquations
using Flux
using Flux.Tracker


b = param(-1.0)
f(t, x) = b.*x

u0 = param(1.0)
prob = ODEProblem(f, u0, (0, 1.0))
sol = solve(prob)
# output:
# ERROR: MethodError: Cannot `convert` an object of type Array{Float64,0} to an object of type TrackedArray{…,Array{Float64,0}}
# This may have arisen from a call to the constructor TrackedArray{…,Array{Float64,0}}(...),
# since type constructors fall back to convert methods.

Ultimately, I want to compare to compute l = loss(sol.u, u_truth) and call back!(l) to compute the gradients with respect to the parameters using Flux. Is this possible in principal?


#2

In principle, yes. In practice, it depends on how much Autograd.jl (for KNet.jl) or Flux.jl support in their backpropagation algorithms. I think it would be easier to just use ForwardDiff.jl for the gradients (which we already know works) for now, and use that with SGD or ADAM.

But if you want to poke around on this, I suggest getting onto the Slack so we can work through this. It’ll likely take some new method definitions for the backprop stuff, but shouldn’t be too bad?


#3

Thanks for replying Chris.

FWIW I just tried the same thing in KNet and I get a similar error:

ERROR: MethodError: Cannot `convert` an object of type AutoGrad.Rec{Float64} to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
Stacktrace:

I think it would be easier to just use ForwardDiff.jl for the gradients

Unfortunately, I think backward derivatives are needed for computational reasons because neural networks have so many free parameters.

I would be interested in trying to get this to work, but I am pretty new to julia.


#4

You’re free to mix and match forward and backward, and different AD techniques generally, as you please. Assuming you have a neural network generating b which you then feed into a solver, you can do something like:

  • Run the forward pass of the network to generate b as a tracked vector
  • Unwrap b and create a dual number b + ϵ
  • Run the solver and calculate the loss to get loss + dloss/db*ϵ
  • Call back!(b, dloss/db) to have flux calculate dloss/dparam for each net parameter.

There’s going to be a bit of plumbing, but we can help with that. And if it’s working well we can find ways to make Flux/DiffEq integration smoother.


#5

What is your use-case?


#6

I am asking because I saw that you are also working with dynamic noise/stochastic differential equations, e.g. Ornstein-Uhlenbeck processes, which I am working on.


#7

Hi Chris,

Thanks for your great work on DifferentialEquations.jl! The possibility to get a gradient of the solution of an ODE is really mind-blowing :slight_smile:

You mention here that it takes 5 line of code to use DifferentialEquations.jl with Flux.jl. Would you mind sharing an example?


#8

I’m trying to build an adjoint model, which is really just reverse-mode AD. Specifically, I want to apply it to a simple handwritten PDE solver, and compare the gradients at each step to a hand-derived adjoint. But even for a simple Euler ODE type of model, I run into similar problems as originally discussed in this thread:

using Flux.Tracker
x = zeros(10); dt = 0.01; c = param(0.5)
function integrate(x, c, dt)
    for i in 1:length(x)-1
        x[i+1] = x[i] - c*x[i]*dt
    end
end

Just running this function gives “ERROR: MethodError: no method matching Float64(::Flux.Tracker.TrackedReal{Float64})”.

Is there a way to do this in Julia using pure reverse AD (and even evaluate the tape instruction by instruction) so I can compare to a hand-derived adjoint code? (Or is any such solution approach on the horizon?) Failing that, is there a more natural/transparent way of getting the gradients (without pure reverse mode) using current versions of AD, than the fairly manual “plumbing” approach suggested by @MikeInnes back in January?


#9

In case you didn’t figure it out, the error here is because the AD is using a custom number type (not Float64) that carries gradient information. zeros(10) gives you a container that can only hold Float64 numbers. If you can write this in comprehension or functional style that’ll make it easier, otherwise you just have to be careful to make your code generic over number type (which is good style anyway).

I put up a new model in the model zoo that shows how one can use mixed-mode AD to backpropagate through a DifferentialEquations.jl simulation; hopefully that can serve as a starting point for others’ models.


#10

Is there are reason why Flux’s tracked values don’t just work now? Some of the recent changes made AD a lot simpler, I think we may not need to resort to running a forward mode there anymore if the norm is defined appropriately. I know ReverseDiff works just fine and it’s similar?

Here’s our recent advance: https://github.com/JuliaDiffEq/DiffEqBase.jl/pull/156


#11

Most likely they would. For practical purposes I’d expect forward mode to be quite a lot faster though, unless you really have a huge number of parameters.


#12

Indeed that’s true. Forthcoming paper should be on Arxiv next month… :slight_smile:


#13

Thanks. I think I’m too much of an AD novice to understand how a mixed-mode implementation works at this point, or what its benefits are. I’ve moved on to trying to get my original ODE problem to work more or less as originally implemented, except without hardcoding a Float64 array. I have run into a different problem, with a slightly different simplified example:

function f(c)
    β = 0.8 # 0.7 works
    F = 1.0 .- β*[0,1,2]

    x = zeros(eltype(param(0.0)), 3)
    for i in 1:2
        x[i+1] = x[i] - c*x[i]      
    end

    return sum(x.^2)
end

derivative(f, param(0.25))

gives:

DomainError with -0.050000000000000044:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).

Stacktrace:
 [1] throw_complex_domainerror(::Symbol, ::Float64) at ./math.jl:31
 [2] log(::Float64) at ./special/log.jl:285
 [3] _forward at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/scalar.jl:55 [inlined]
 [4] #track#1 at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/Tracker.jl:50 [inlined]
 [5] track at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/Tracker.jl:50 [inlined]
 [6] log at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/scalar.jl:57 [inlined]
 [7] #218 at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/scalar.jl:66 [inlined]
 [8] back_(::Flux.Tracker.Grads, ::Flux.Tracker.Call{getfield(Flux.Tracker, Symbol("##218#219")){Flux.Tracker.TrackedReal{Float64},Int64},Tuple{Flux.Tracker.Tracked{Float64},Nothing}}, ::Int64) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:103
 [9] back(::Flux.Tracker.Grads, ::Flux.Tracker.Tracked{Float64}, ::Int64) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:118
 [10] #4 at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:106 [inlined]
 [11] foreach at ./abstractarray.jl:1836 [inlined]
 [12] back_(::Flux.Tracker.Grads, ::Flux.Tracker.Call{getfield(Flux.Tracker, Symbol("##202#203")),Tuple{Flux.Tracker.Tracked{Float64},Flux.Tracker.Tracked{Float64}}}, ::Int64) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:106
 [13] back(::Flux.Tracker.Grads, ::Flux.Tracker.Tracked{Float64}, ::Int64) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:118
 [14] #6 at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:131 [inlined]
 [15] (::getfield(Flux.Tracker, Symbol("##9#11")){getfield(Flux.Tracker, Symbol("##6#7")){Params,Flux.Tracker.TrackedReal{Float64}}})(::Int64) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:140
 [16] gradient(::Function, ::Flux.Tracker.TrackedReal{Float64}) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:152
 [17] derivative(::Function, ::Flux.Tracker.TrackedReal{Float64}) at /Users/nurban/.julia/packages/Flux/UHjNa/src/tracker/back.jl:155
 [18] top-level scope at In[448]:12

The failure depends on the value of the forcing coefficient β; changing it to 0.7 works. After some diagnosis, it appears that the problem occurs at β>0.75, when the gradient evaluated at c=0.25 goes from negative to positive.

This is a case where the gradient exists and a finite difference approximation does fine as far as I can tell, but AD fails, apparently due to a logarithm lurking somewhere inside the chain rule. Any suggestions? Maybe this can be worked around by hand-coding part of the gradient if I write it out by hand, but I’d rather not try to do that in my more complicated real problem.


#14

I don’t know what counts as huge, but for around 100–500 parameters (depending on sparsity) Flux.jl becomes faster than ForwardDiff.jl for me for \mathbb{R}^n\to\mathbb{R} functions (log densities). So LogDensityProblems.jl, the helper framework for the latest DynamicHMC.jl, allows you to choose just by changing a single line.


#15

This sounds like this issue so it might be fixed on master. If not please do report a new one and I’ll look into it.


#16

Mixed mode shouldn’t be needed anymore. With DiffEqBase v4.28.1 the Flux Tracker types should just work in DiffEq. Let me know if you run into any issues.

(Note: It is probably still better to run mixed mode unless you have a lot of parameters, but hey the convenience is nice)