Need Diffractor.jl for State-of-the-Art Deep Learning Model

oxinabox · July 27, 2021, 10:43am

I agree for vanillia higher order AD, there are better ways than nesting reverse.
E.g. Forward over forward, Forward over reverse (especially for hessians), and taylor mode.
I complained to Keno about this several times, but i have now been convinced.

The case Diffactor is useful for is not directly nested AD but when there is a function in between.
i.e AD → some function of gradients → AD

For example imagine that I have some back box bb(x) and I am trying to train a Neural net to imitate it nn(x; θ) (where theta is my neural network parameters).
And I not only want to get the output to match, I also want the derivitive to match.
That will mean each training example is much more informative – which is good if bb is expensive to run (which it quite probably is, since that is one reason to train a nn to immitate it)

So I write my loss function as as the sum of the squared difference both of bb and nn at x, and of their derivatives bb' and nn' at x

loss(x, θ) = (bb(x) - nn(x; θ))^2 + (bb'(x) - nn'(x; θ))^2

Now I want to compute the dervative of loss with regards to θ so I can train my neural net.
Done naively the result is that one would call reverse mode AD on code generated by reverse mode AD.

Maybe there is a smart way to reformulate it to avoid calling reverse on code generated by reverse.
If so, I would like to hear about it. (especially if it is generalizable to other problems that are not quite in this form)

Topic		Replies	Views
Which autodiff to currently use for a neural network backend? General Usage package , statistics , machinevision	10	2205	October 1, 2018
Diffractor release Package Announcements autodiff	30	3059	July 29, 2023
What is the difference between Zygote vs ForwardDiff and ReverseDiff Machine Learning	4	6659	February 23, 2021
ReverseDiff.jl Community package , announcement	9	1737	December 13, 2017
State of reverse mode AD tools Numerics question	8	1140	March 5, 2019

Need Diffractor.jl for State-of-the-Art Deep Learning Model

Related topics