Collect model outputs into multi-dimensional array

Tengrath · December 12, 2021, 2:14pm

Consider the following model which creates an output after every layer

’
function track(model, input)
track::Vector{Vector{Float64}} =

p = input
for layer in model
    p = layer(p)
    track = vcat(track, [p,])
end
return track

end

’

which is unfortunately quite inefficient. Is there a way to store the outputs more efficiently into preallocated memory? Ideally into a mutli-dimensional array?

trahflow · December 13, 2021, 1:45pm

Sure, if you know the size(s) beforehand (and things are rectangular, i.e. size of the output of each layer is the same) E.g. for 5 layers, each with an output of length 64, you can initialize a multi(i.e. 2)-dimensional array like this:

track = Matrix{Float64}(undef, 64, 5)

And then loop over the layers like this:

for (i, layer) in enumerate(model)
    p = layer(p)
    track[:, i] = p
end

stevengj · December 13, 2021, 2:01pm

Not that this still allocates a vector p and then copies it into your matrix. To avoid allocations completely, you would want to rewrite your layer function to operate in-place on a pre-allocated vector (and then pass in e.g. @view track[:, i] to write the results directly into the matrix column).

Tengrath · December 13, 2021, 6:01pm

Unfortunately this approach is not supported by Flux/Zygote

Mutating arrays is not supported – called setindex!(::Matrix{Float64}, _…)

But I guess there must be some workaround for this since the array is not mutated in a mathematical sense. In contrast to in-place operations which override data necessary for calculation of gradients, this just stores results into memory.

ToucheSir · December 13, 2021, 6:10pm

I don’t think it was clear up front that this was in a gradient context, because that changes the solution space quite dramatically. There are two ways you can go about this in Zygote:

Use Zygote.Buffer instead of a plain array.
Put your layers in a Chain, call Flux.activations to get a set of outputs and then reduce(hcat, outputs) to allocate the array once.

#1 is your best shot (short of writing AD rules) for the in-place option @stevengj described. #2 may be faster if you can live with the p = layer(p) allocation for each layer, but I would try both just to be sure.

Tengrath · December 16, 2021, 10:25am

Thanks a lot!

In case I stick with ‘p = layer(p)’ Flux.activations outperforms Zygote.Buffer significantly:

60.174614 seconds (79.67 M allocations: 16.958 GiB, 3.24% gc time)

vs

210.037932 seconds (691.79 M allocations: 49.948 GiB, 4.98% gc time)

Topic		Replies	Views
Instatiate a tracked array? or dynamically store slices of a tracked array? (Not sure what to call this) Machine Learning	6	531	August 2, 2019
Flux.params of a matrix implemented as a struct Machine Learning zygote	11	973	May 17, 2021
Multilayer perceptron with multidimensional output array using Flux Machine Learning flux	2	819	October 14, 2021
Flux: multiple input of unequal dimensions Machine Learning flux	4	1299	September 7, 2020
Flux: concatenate layers Machine Learning	7	3204	September 18, 2020

Collect model outputs into multi-dimensional array

Related topics