I don’t think it was clear up front that this was in a gradient context, because that changes the solution space quite dramatically. There are two ways you can go about this in Zygote:
- Use
Zygote.Bufferinstead of a plain array. - Put your layers in a
Chain, callFlux.activationsto get a set of outputs and thenreduce(hcat, outputs)to allocate the array once.
#1 is your best shot (short of writing AD rules) for the in-place option @stevengj described. #2 may be faster if you can live with the p = layer(p) allocation for each layer, but I would try both just to be sure.