How can I write a neural network using ForwardDiff.jl package?

DeepQ · February 1, 2021, 7:00pm

I want to write a neural network using ForwardDiff.jl package but I can’t find any example.

oxinabox · February 1, 2021, 7:06pm

If number of parameters is much greater than number of outputs the you should use reverse mode AD.
In a neural networks there normally are hundreds or thousands of parameters (the weights band biases), and 1 output (the loss).

With that said you can do this.
Use ForwardDiff.gradient
It will be faster than reverse mode if you have 5-10 parameters.

DeepQ · February 1, 2021, 7:15pm

I first tried to use Flux.jl to reproduce the results of a paper about normalizing flows. My Pytorch code works but the Flux code doesn’t work and it seems that Flux suffers from numerical problems. I don’t care about training time because I don’t think there would be a huge difference.
It is strange that Flux.jl is the main DL package in Julia and it suffers from numerical problems!

oxinabox · February 1, 2021, 9:21pm

If you mean floating point truncation errors, then that is unlikely.
AD doesn’t incur truncation errors.
It may have bugs, if so you should open issues.
but it really shouldn’t have round-off errors.
Feel encourages to start another thread about that, someone might help you debug it.
Or if you can get a vaguely minimal example of an error open an issue on GitHub.

I don’t care about training time because I don’t think there would be a huge difference.

It will be orders of magnitude.
To compute the gradient via forward mode involves basically running the code onces per parameter.
Where as reverse mode is once per output (which is to say once).

Reverse mode does have a higher overhead, but unless the net is a tiny toy from the 90s, then it won’t dominate.

DeepQ · February 2, 2021, 4:28am

So it is better to use ReverseDiff.jl? My whole model has less than two thousand parameters and I don’t need to train it for long. The loss almost converges after 100 epochs.

Tomas_Pevny · February 2, 2021, 5:13am

I do not think that errors will be a problem. There is a bit outdated package implementing Masked Autoregressive flows here

Also, we have written

which uses “Dense” flows, where Dense matrix is optimized in its SVD form, which allows efficient calculation of Jacobian and inverse. We never had a problem with numerical stability.

Tomas

DeepQ · February 2, 2021, 5:29am

Try implementing Spline flows. I tried for months and it seems there is an unsolved bug in Flux.

oxinabox · February 2, 2021, 9:47pm

If bugs are not reported then they are not fixed.

Topic		Replies	Views
Applying Reverse-Mode AD to ForwardDiff Derivatives/Gradients New to Julia flux , forwarddiff , chainrulescore	0	368	May 3, 2022
Automatic Differentiation Machine Learning	11	3291	February 11, 2019
Issues with computing gradient with ForwardDiff.jl (Any fixes other than ND?) Machine Learning forwarddiff , neural-network	2	213	December 24, 2023
DifferentialEquations.jl + Flux.jl or Knet.jl Machine Learning diffeq	15	3594	October 18, 2018
Higher order derivatives/ automatic differentiation Machine Learning question , flux , autodiff	5	697	December 26, 2023

How can I write a neural network using ForwardDiff.jl package?

Related topics