DiffEqFlux Autodifferentiating inside loss function

Axlrsn · September 24, 2020, 11:52am

Hello,

I am new to Julia and especially the way Zygote works. For a project in physics informed machine learning I have a neural network NN(x) approximating a function f(x). I want to take the gradient of the neural network and compare it to the derivatives of the function to estimate the loss as the difference between NN’(x) and f’(x) for all x (I know that for a good solution I would compare the values as well, but this serves more as an illustration). I made a small example to show the approach:


using DiffEqFlux, Optim, Flux, Statistics, Zygote

# We want to train a neural network to approximate sin(x)
# by minimizing the loss of the derivatives of the neural network
# to the derivatives of sin(x) AKA cos(x).

xsteps = range(0, 2 * pi, length=50) # Interval
du_true = broadcast(cos, xsteps) # Derivative of sin(x)

# Initialize NN
NN = FastChain(FastDense(1, 12, tanh), FastDense(12, 12, tanh), FastDense(12, 1))
pinit = initial_params(NN)

# Test outside of loop works
du_NN = [gradient(x -> NN(x, pinit)[1], x)[1][1] for x in collect(xsteps)]

# Create loss function
function loss(p)
    du_NN = [gradient(x -> NN(x, p)[1], x)[1][1] for x in collect(xsteps)]
    mean(broadcast(-, du_true, du_NN))
end

# Train the NN
res = DiffEqFlux.sciml_train(loss,pinit,ADAM(0.01),
                            maxiters=200)

When I do this I get the error “Mutating arrays are not supported”. I currently lack the intuition on how to solve this. Also, feel free to point out performance errors etc. as I want to learn.

ChrisRackauckas · September 28, 2020, 4:26pm

Sorry for the late response: busy week. Doing a physics-informed neural network is more efficient with forward mode on the NN with reverse mode on the loss function. So you could in theory get this working, but I still wouldn’t recommend it done this way. In this form, using a finite difference in the loss function will be a bit more efficient and have the added bonus of regularizing over spatial oscillations. Or you can mix ForwardDiff into there for forward mode. Let me know if that helps.

(P.S. there are some massive changes happening to the autodiff to make a new mixed mode which will do the most optimal thing more easily, so rather than sharing some obscure workarounds I’d say, just do the regularized one and we’ll let you know when the newer form is out)

Axlrsn · September 29, 2020, 2:11pm

Thank you Chris, and no problem for the late answer. I realize that I forgot to write in the description that I also need second order derivatives for my actual project. As I’ve learned that this is not supported by Zygote, I cannibalized some NeuralPDE.jl code for numerical derivatives:

function get_derivative_simplified(order,phi,x,θ)

    _derivative = (x,θ,order) ->
    begin
        if order == 1
            return (phi(x+cbrt(eps(Float32)),θ)[1] - phi(x-cbrt(eps(Float32)),θ)[1])/(2*cbrt(eps(Float32)))

        else
            return (_derivative(x+cbrt(eps(Float32)),θ,order-1)
                  - _derivative(x-cbrt(eps(Float32)),θ,order-1))/(2*cbrt(eps(Float32)))
        end

    end

    return _derivative(x,θ,order)
end

And then in the loss function:

function loss(p)    
    df(x)=get_derivative_simplified(1,NN,x,p)
    du_NN = [df(x) for x in collect(xsteps)]
    mean(broadcast(-, du_true, du_NN))
end

This approach gave a good result, but made the whole training take approximately 30% longer than just explicitly writing the finite differences inside the loss function:

    function loss(p)
    f(x) = NN(x, p)[1]    
    df(x) = (f(x + cbrt(eps(Float32))) - f(x - cbrt(eps(Float32)))) / (2*cbrt(eps(Float32)))
    du_NN = [df(x) for x in collect(xsteps)]
    mean(broadcast(-, du_true, du_NN))
end

For my actual project, the difference was rather 50%. My guess is that this is due to the recursive and conditional stuff going around in the get_derivative_simplified function. Or is there some other performance oversight that I am missing here?

Interesting for the P.S., I’m looking forward to hear about these new changes.

ChrisRackauckas · September 29, 2020, 2:33pm

It’ll depend on the neural network size, with larger NNs seeing less overhead. This overhead can be completely removed though by using an iterator form for the loss function instead of building a bunch of arrays, and making sure everything is concretely typed. I’ll talk with @kirillzubov about it.

Axlrsn · September 29, 2020, 2:51pm

Thank you for the tips!

ChrisRackauckas · September 29, 2020, 2:58pm

What’s your Github ID? I’ll ping you in an issue, or if you want to open one on NeuralPDE on the performance please do.

Axlrsn · September 29, 2020, 3:03pm

It’s @AxlaDesign but I could do it.

Topic		Replies	Views
Autograd through loss function with derivatives in Machine Learning	1	879	March 25, 2021
DiffEqFlux: an error when using NN's that take both time and position as input Machine Learning diffeq , neural-network	6	970	May 11, 2021
Calculate gradient of UDE model (without DiffEqFlux.sciml_train) Machine Learning question , diffeq , flux	3	706	November 6, 2020
ReverseDiff for loss function with Zygote derivatives Machine Learning	1	404	February 10, 2023
Flux.gradient throws MethodError even though loss is evaluated just fine New to Julia question , diffeq	3	550	April 16, 2022

DiffEqFlux Autodifferentiating inside loss function

Related topics