Collect model outputs into multi-dimensional array

Consider the following model which creates an output after every layer

function track(model, input)
track::Vector{Vector{Float64}} =

p = input
for layer in model
    p = layer(p)
    track = vcat(track, [p,])
return track


which is unfortunately quite inefficient. Is there a way to store the outputs more efficiently into preallocated memory? Ideally into a mutli-dimensional array?

Sure, if you know the size(s) beforehand (and things are rectangular, i.e. size of the output of each layer is the same) E.g. for 5 layers, each with an output of length 64, you can initialize a multi(i.e. 2)-dimensional array like this:

track = Matrix{Float64}(undef, 64, 5)

And then loop over the layers like this:

for (i, layer) in enumerate(model)
    p = layer(p)
    track[:, i] = p

Not that this still allocates a vector p and then copies it into your matrix. To avoid allocations completely, you would want to rewrite your layer function to operate in-place on a pre-allocated vector (and then pass in e.g. @view track[:, i] to write the results directly into the matrix column).

1 Like

Unfortunately this approach is not supported by Flux/Zygote

Mutating arrays is not supported – called setindex!(::Matrix{Float64}, _…)

But I guess there must be some workaround for this since the array is not mutated in a mathematical sense. In contrast to in-place operations which override data necessary for calculation of gradients, this just stores results into memory.

I don’t think it was clear up front that this was in a gradient context, because that changes the solution space quite dramatically. There are two ways you can go about this in Zygote:

  1. Use Zygote.Buffer instead of a plain array.
  2. Put your layers in a Chain, call Flux.activations to get a set of outputs and then reduce(hcat, outputs) to allocate the array once.

#1 is your best shot (short of writing AD rules) for the in-place option @stevengj described. #2 may be faster if you can live with the p = layer(p) allocation for each layer, but I would try both just to be sure.


Thanks a lot!

In case I stick with ‘p = layer(p)’ Flux.activations outperforms Zygote.Buffer significantly:

60.174614 seconds (79.67 M allocations: 16.958 GiB, 3.24% gc time)


210.037932 seconds (691.79 M allocations: 49.948 GiB, 4.98% gc time)