ReverseDiff - differentiating with respect to some parameters but not others

Nathaniel · July 17, 2022, 3:14am

I’m planning on training some neural networks using ReverseDiff.jl (which I think makes sense because I’ll be training lots of small networks on the CPU), but I have a basic question about how ReverseDiff is intended to be used.

In the examples we’re given a function with some parameters and shown how to differentiate with respect to all parameters. But the outputs of a neural network are a function of (1) the weights and biases, and (2) the inputs to the network. I only need to differentiate with respect to the weights and biases, not the inputs. So is there a way with ReverseDiff to differentiate with respect to some of the parameters to a function, but not others?

baggepinnen · July 17, 2022, 7:07am

Make use of a closure
https://docs.julialang.org/en/v1/devdocs/functions/#Closures
to create a function that takes as input only the parameters you want to differentiate with respect to.

Nathaniel · July 17, 2022, 1:02pm

That does seem to be the intended solution, but it seems from the discussion at machine learning - Julia ReverseDiff: how to take a gradient w.r.t. only a subset of inputs? - Stack Overflow that pre-compiled tapes are not supported if you do this, which is really unfortunate.

dfdx · July 17, 2022, 1:08pm

Note that usually to compute gradient w.r.t. any of the inputs you need to go through the whole computational graph anyway. Ignoring other inputs thus will have little effect on the total run time in most cases.

Re precompiled tapes, is there a reason you are interested specifically in ReverseDiff?

Nathaniel · July 17, 2022, 1:13pm

It looks like ignoring the gradients with respect to the inputs might be the way to go then.

I have no huge reason for preferring ReverseDiff, it just seemed like an established package that would be suitable for training lots of small networks on the CPU. Is there another option that would make my life easier?

dfdx · July 17, 2022, 1:32pm

If you already have experience with ReverseDiff and there are no more blockers, then there’s no real reason to switch. If you encounter more issues though, you may explore other AD packages such as Zygote (perhaps, the most established package at the moment) or Yota (which shares the idea of a tape with ReverseDiff).

Nathaniel · July 18, 2022, 3:42am

Thank you - Zygote seems amazingly more convenient than BackwardDiff, and I was able to get up and running with it very quickly. I don’t know how it will compare speed-wise, but worrying about pre-compiled tapes and such was probably premature optimisation anyway - I’ll stick with Zygote for now and see how it goes.

Topic		Replies	Views
ReverseDiff.jl Community package , announcement	9	1697	December 13, 2017
Mixed-mode automatic differentiation using ForwardDiff and ReverseDiff General Usage forwarddiff , reversediff , autodiff	9	2724	February 1, 2022
Nesting ForwardDiff inside ReverseDiff? General Usage question , differentiation	5	1286	January 4, 2018
ReverseDiff: differentiating a function with an indexing argument Optimization (Mathematical) question , differentiation	5	1204	December 15, 2020
ReverseDiff for loss function with Zygote derivatives Machine Learning	1	409	February 10, 2023

ReverseDiff - differentiating with respect to some parameters but not others

Related topics