How to achieve good performance with Zygote.pushforward on a neural network

de-souza · February 24, 2021, 11:05pm

Hello,

I am trying to implement a neural network with a loss function that contains a derivative of the network with regards to its input.

To be able to use Flux to update the weights of the network, the loss function needs to be differentiable by Zygote. This means that the derivative of the network with regards to its input, computed in the loss function, needs to be differentiable by Zygote.

The easiest way to achieve this is using the finite difference method to compute the derivative of the neural network in the loss function. However there is a new method that was added recently to Zygote, called pushforward, which allows taking a derivative of julia code using forward-mode automatic differentiation. This method can be used here because the code it produces is again differentiable by Zygote.

My problem is the following: I tried to implement a physics-informed neural network using Zygote.pusforward, and I haven’t been able to get good performance with it.

Minimal working example

A physics-informed neural network tries to approximate the sinus function. The loss function is the mean squared error with an additional term that is equal to zero when the derivative of the network is equal to the cosinus (the derivative of the sinus).

The network using the finite difference method takes about 2.8 seconds to train:

using Flux
using Statistics

const X = reshape(0:1f-1:10, 1, :)
const Y = sin.(X)

m = Chain(
    Dense(1, 10, tanh),
    Dense(10, 10, tanh),
    Dense(10, 10, tanh),
    Dense(10, 1),
)

function m′(X::AbstractArray{T}) where T
    Δ = √eps(T)
    (m(X .+ Δ) - m(X)) / Δ
end

loss(X, Y) = Flux.mse(m(X), Y) + mean(abs2.(cos.(X) - m′(X)))

opt = ADAM()
cb() = @show loss(X, Y)
@time Flux.@epochs 1000 Flux.train!(loss, params(m), [(X, Y)], opt; cb)

The network using Zygote.pushforward takes about 2 minutes to train:

using Flux
using Statistics

const X = reshape(0:1f-1:10, 1, :)
const Y = sin.(X)

m = Chain(
    Dense(1, 10, tanh),
    Dense(10, 10, tanh),
    Dense(10, 10, tanh),
    Dense(10, 1),
)

scalar_m(x) = first(m([x]))
scalar_m′(x) = Flux.pushforward(scalar_m, x)(1)
m′(X) = scalar_m′.(X)

loss(X, Y) = Flux.mse(m(X), Y) + mean(abs2.(cos.(X) - m′(X)))

opt = ADAM()
cb() = @show loss(X, Y)
@time Flux.@epochs 1000 Flux.train!(loss, params(m), [(X, Y)], opt; cb)

Is there any way I could improve my code to get at least similar performance between using the finite difference method and using Zygote.pushforward?

If this is not currently possible, this is not too bad as there seems to be a new automatic differentiation system in the works, as mentioned by Chris Rackauckas in this similar issue in the NeuralPDE.jl repository. However I would love to know if this is just a problem in my code.

Thanks.

de-souza · February 26, 2021, 11:45am

Hello,

In fact Zygote.pushforward is not limited to functions with scalar inputs, and it can be used directly on the neural network!

Here is the same network using Zygote.pushforward on the full network. It takes about the same amount of time as the finite difference method:

using Flux
using Statistics

const X = reshape(0:1f-1:10, 1, :)
const Y = sin.(X)

m = Chain(
    Dense(1, 10, tanh),
    Dense(10, 10, tanh),
    Dense(10, 10, tanh),
    Dense(10, 1),
)

m′(X::AbstractArray) = Flux.pushforward(m, X)(1)

loss(X, Y) = Flux.mse(m(X), Y) + mean(abs2.(cos.(X) - m′(X)))

opt = ADAM()
cb() = @show loss(X, Y)
@time Flux.@epochs 1000 Flux.train!(loss, params(m), [(X, Y)], opt; cb)

Matthieu_BARREAU · May 31, 2023, 2:59pm

Hi,

I am dealing with the exact same problem right now.
I would like to use this method but it seems Flux and Zygote versions have changed since your post and your MWE don’t work anymore. In particular, I got the message:

ERROR: ArgumentError: tuple must be non-empty

Do you have any idea about how to solve it?

Best,
Matth

de-souza · May 31, 2023, 3:21pm

Hi,

I have the same problem, and I haven’t found a way to solve it. I answered your question on the new topic you created:

Topic		Replies	Views
ReverseDiff for loss function with Zygote derivatives Machine Learning	1	404	February 10, 2023
How to use gradient of neural network as the loss function? Machine Learning question	13	2731	March 23, 2021
Second derivative in Zygote returns nothing Machine Learning zygote , neural-network	1	276	May 31, 2023
Zygote gradients for functions with Array output General Usage	2	1007	May 16, 2020
Use ForwardDiff instead of Zygote with Flux? Machine Learning	10	1699	September 3, 2021

How to achieve good performance with Zygote.pushforward on a neural network

Minimal working example

Related topics