Does 'gradient' allocate?

e3c6 · January 2, 2020, 5:06pm

In a call such as g = gradient(loss, ps) , where ps is a Params object (see for instance this line: Flux.jl/train.jl at e92da0cf850a982c425b83c92d6274174e52b02c · FluxML/Flux.jl · GitHub), am I allocating a new vector for the gradient? Or does the vector already exist somewhere (perhaps within the Params structure) and gradient is just returning a reference to it?

If gradient is allocating, would it be worthwhile to think of rewriting the Flux.train! loop to use a pre-allocated gradient vector?

tanhevg · January 2, 2020, 6:10pm

The answer to the first question is yes, it will allocate. Params just encapsulates model parameters in various data structures (ordering vector and id set to be precise, see code for more details).

Re pre-allocating gradients - would be interesting to hear what the maintainers think of it. I would argue against it, because this would consume precious memory that could otherwise be used for temporary variables during model/gradient computation. In real life forward and backward pass takes so long, that pre-allocating a single array would not save much.

Topic		Replies	Views
Flux.params of a matrix implemented as a struct Machine Learning zygote	11	979	May 17, 2021
Understanding Flux.jl use of `gradient` and `params` Machine Learning flux	4	3520	October 2, 2021
Calling Flux.params() inside gradient changes output? Machine Learning flux , zygote	2	356	September 28, 2021
Memory allocations with Zygote Numerics question , memory-allocation , zygote , gradient	8	722	November 15, 2023
ForwardDiff and GradientConfig memory usage Performance	10	1122	April 5, 2018

Does 'gradient' allocate?

Related topics