I want to reuse the exact same layer in a network. But I can’t figure out whether my naive approach will do that. My toy architecture is
using Flux
D1 = Dense(2,2)
D2 = Dense(2,2)
NaiveReuse = Chain(D1, Parallel(vcat,Chain(D2,Parallel(vcat, D1, identity)), identity))
The output of params(NaiveReuse)
is
Params([Float32[-1.1226765 0.9502689; 0.6875402 0.4517343], Float32[0.0, 0.0], Float32[-0.18272986 -0.16167739; 0.46781456 1.2025808], Float32[0.0, 0.0]])
but I’m having trouble interpreting that. It looks like only two matrices are being stored. Am I correct to assume that the parameters for D1 will be reused and properly updated by Zygote during training? If not, how would I go about that?