DiffEqFlux Autodifferentiating inside loss function

Hello,

I am new to Julia and especially the way Zygote works. For a project in physics informed machine learning I have a neural network NN(x) approximating a function f(x). I want to take the gradient of the neural network and compare it to the derivatives of the function to estimate the loss as the difference between NN’(x) and f’(x) for all x (I know that for a good solution I would compare the values as well, but this serves more as an illustration). I made a small example to show the approach:


using DiffEqFlux, Optim, Flux, Statistics, Zygote

# We want to train a neural network to approximate sin(x)
# by minimizing the loss of the derivatives of the neural network
# to the derivatives of sin(x) AKA cos(x).

xsteps = range(0, 2 * pi, length=50) # Interval
du_true = broadcast(cos, xsteps) # Derivative of sin(x)

# Initialize NN
NN = FastChain(FastDense(1, 12, tanh), FastDense(12, 12, tanh), FastDense(12, 1))
pinit = initial_params(NN)

# Test outside of loop works
du_NN = [gradient(x -> NN(x, pinit)[1], x)[1][1] for x in collect(xsteps)]

# Create loss function
function loss(p)
    du_NN = [gradient(x -> NN(x, p)[1], x)[1][1] for x in collect(xsteps)]
    mean(broadcast(-, du_true, du_NN))
end

# Train the NN
res = DiffEqFlux.sciml_train(loss,pinit,ADAM(0.01),
                            maxiters=200)

When I do this I get the error “Mutating arrays are not supported”. I currently lack the intuition on how to solve this. Also, feel free to point out performance errors etc. as I want to learn.

Sorry for the late response: busy week. Doing a physics-informed neural network is more efficient with forward mode on the NN with reverse mode on the loss function. So you could in theory get this working, but I still wouldn’t recommend it done this way. In this form, using a finite difference in the loss function will be a bit more efficient and have the added bonus of regularizing over spatial oscillations. Or you can mix ForwardDiff into there for forward mode. Let me know if that helps.

(P.S. there are some massive changes happening to the autodiff to make a new mixed mode which will do the most optimal thing more easily, so rather than sharing some obscure workarounds I’d say, just do the regularized one and we’ll let you know when the newer form is out)

Thank you Chris, and no problem for the late answer. I realize that I forgot to write in the description that I also need second order derivatives for my actual project. As I’ve learned that this is not supported by Zygote, I cannibalized some NeuralPDE.jl code for numerical derivatives:

function get_derivative_simplified(order,phi,x,θ)

    _derivative = (x,θ,order) ->
    begin
        if order == 1
            return (phi(x+cbrt(eps(Float32)),θ)[1] - phi(x-cbrt(eps(Float32)),θ)[1])/(2*cbrt(eps(Float32)))

        else
            return (_derivative(x+cbrt(eps(Float32)),θ,order-1)
                  - _derivative(x-cbrt(eps(Float32)),θ,order-1))/(2*cbrt(eps(Float32)))
        end

    end

    return _derivative(x,θ,order)
end

And then in the loss function:

function loss(p)    
    df(x)=get_derivative_simplified(1,NN,x,p)
    du_NN = [df(x) for x in collect(xsteps)]
    mean(broadcast(-, du_true, du_NN))
end

This approach gave a good result, but made the whole training take approximately 30% longer than just explicitly writing the finite differences inside the loss function:

    function loss(p)
    f(x) = NN(x, p)[1]    
    df(x) = (f(x + cbrt(eps(Float32))) - f(x - cbrt(eps(Float32)))) / (2*cbrt(eps(Float32)))
    du_NN = [df(x) for x in collect(xsteps)]
    mean(broadcast(-, du_true, du_NN))
end

For my actual project, the difference was rather 50%. My guess is that this is due to the recursive and conditional stuff going around in the get_derivative_simplified function. Or is there some other performance oversight that I am missing here?

Interesting for the P.S., I’m looking forward to hear about these new changes.

It’ll depend on the neural network size, with larger NNs seeing less overhead. This overhead can be completely removed though by using an iterator form for the loss function instead of building a bunch of arrays, and making sure everything is concretely typed. I’ll talk with @kirillzubov about it.

Thank you for the tips!

What’s your Github ID? I’ll ping you in an issue, or if you want to open one on NeuralPDE on the performance please do.

It’s @AxlaDesign but I could do it.