Does 'gradient' allocate?

In a call such as g = gradient(loss, ps) , where ps is a Params object (see for instance this line: Flux.jl/train.jl at e92da0cf850a982c425b83c92d6274174e52b02c · FluxML/Flux.jl · GitHub), am I allocating a new vector for the gradient? Or does the vector already exist somewhere (perhaps within the Params structure) and gradient is just returning a reference to it?

If gradient is allocating, would it be worthwhile to think of rewriting the Flux.train! loop to use a pre-allocated gradient vector?

The answer to the first question is yes, it will allocate. Params just encapsulates model parameters in various data structures (ordering vector and id set to be precise, see code for more details).

Re pre-allocating gradients - would be interesting to hear what the maintainers think of it. I would argue against it, because this would consume precious memory that could otherwise be used for temporary variables during model/gradient computation. In real life forward and backward pass takes so long, that pre-allocating a single array would not save much.

2 Likes