Use ForwardDiff instead of Zygote with Flux?

I’m embedding a Flux model into a physical simulation but Zygote doesn’t seem to be able to handle taking the gradient. I got ForwardDiff.gradient working with the code yesterday. Can I use ForwardDiff as the backend instead?

Yes, if you use GalacticOptim.jl you can just AutoForwardDiff the OptimizationFunction and then send it to Flux optimizers and it’ll use ForwardDiff.

1 Like

Any examples of this?

Replacing f(x) with Zygote.forwarddiff(f, x) will also let you use ForwardDiff for one function call, while everything outside that uses Zygote. I’m not sure from your description whether this would solve your problem.

1 Like

You may already be aware of this, but there is a reason forward mode AD is not used with NNs.
Forward mode AD scales with number of parameters.
Where as Reverse mode AD scales with number of outputs.

NN training has 1 output – the loss.
And generally hundreds to thousands of parameters – the weights and biases.

Forward mode does have a much lower overhead then reverse.
but the crossover point is usually somewhere between 5 and 200 paramters.

If you have thousands of parameters in your NN then using ForwardDiff is going to be incredibly slow.
You might want to look into another AD, like Enzyme.
(Or perhaps Yota with Avalon rather than Flux; or Tracker with TrackerFlux.jl)

3 Likes

Very helpful suggestions!

My research tends to learn more heavily on the physics-understanding side, and only use ML for small closures. For example, this introductory toe-into-the-water example should only need a few dozen parameters. So forward mode is fine, but it still may not be the best plan moving forward.

Plus Flux has all those nice built-in models and optimizers - and I really need to stop reinventing the wheel on things like that. Unfortunately, mutating arrays is how all the physics updates are done in my simulation, which is why Zygote isn’t happy with it. I saw some work-around suggestions, but they didn’t seem worth it.

I will look into other packages you mentioned. Thanks again.

2 Likes

I hate to press, but I’m having trouble figuring out how to use your Zygote.forwarddiff suggestion with the implicit parameters used in Flux models. For example:

using Flux,Zygote

model = Dense(2,2)
loss(x) = sum(abs2,model(x))
data = rand(2,10)
loss(data)
gs = gradient(()->loss(data),params(model))
for p in params(model)
    Flux.update!(Descent(),p,gs[p])
end
loss(data)

This works (since I’m not mutating arrays in loss(data)) but how can I switch to forwarddiff without losing the Flux model and optimizer?

Did you look at the tutorials?

using GalacticOptim, Flux

rosenbrock(x, p) =  (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
x0 = zeros(2)
_p  = [1.0, 100.0]

f = OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff())
l1 = rosenbrock(x0, _p)
prob = OptimizationProblem(f, x0, _p)
sol = solve(prob, ADAM(), maxiters = 100)
@show sol.u
@show sol.minimum

There you go, ForwardDiff with Flux optimizers. And then you can switch to any other AD, and any other optimizers.

1 Like

I did see that, but this tutorial defines an objective function with explicit parameters, while I am trying to use a Flux model which has the parameters defined inside the type. I can see why implicit parameters is handy for using multi-layer models in Flux, but I can’t see how to pull those out to hand over to ForwardDiff

For example:

model = Dense(2,2)
loss(x) = sum(abs2,model(x))
data = rand(2,10)

f = OptimizationFunction(loss, GalacticOptim.AutoForwardDiff())
prob = OptimizationProblem(f, params(model), data)
sol = solve(prob, ADAM(), maxiters = 100)

Throws the error ERROR: MethodError: no method matching typemax(::Type{Any})

using GalacticOptim, Flux
model = Dense(2,2)
p,re = Flux.destructure(model)
function loss(p,_) 
    sum(abs2,re(p)(data))
end
data = rand(2,10)

f = OptimizationFunction(loss, GalacticOptim.AutoForwardDiff())
prob = OptimizationProblem(f, p, data)
sol = solve(prob, ADAM(), maxiters = 100)
1 Like

Awesome! Flux.destructure was the missing link.