Flux: How to create a custom multi-layer model with some parameters shared across layers?

IRM · July 7, 2021, 2:41am

Hi all,

I’m somewhat new to Julia and Flux, and trying to train a model similar to a standard dense multi-layer neural network, but with sharing of some trainable parameters between layers.

To give a concrete example (not exactly what I want, but its close enough to illustrate the problem I’m facing in Flux):
The model has as parameters a sequence of matrices A_l and \Lambda_l, with the latter diagonal and positive-definite.
For layers l=1, ... , L-1:
x_{l+1} = \sigma (\Lambda_{l+1}^{-1} A_l \Lambda_l x_l)
and a final output layer
y = A_L \Lambda_L x_L

The main difficulty is that each matrix \Lambda_l for l=2, ..., L appears in both layer l and layer l-1. For this reason I cannot just use Chain, at least as far as I know.

What is the best way of coding this in Flux?

I have tried the code below. It works until the last line which gives the error

ERROR: Only reference types can be differentiated with Params.

I have searched for this error and no solutions I found address exactly this problem. I understand that the problem is related to having a vector of arrays in the struct defining the model. But is there a better way of representing such a structure with a flexible number of layers? Or is there a way of getting Flux to differentiate with respect to the arrays As and ds?

Grateful for any assistance!


using Flux

mutable struct Multi
    As::Vector{Array{Float64}}
    ds::Vector{Array{Float64}}
end

function (m::Multi)(x)
    L = length(m.As)
    for l = 1:(L-1)
        Λ = diagm(exp.(m.ds[l]))
        V = diagm(exp.(-m.ds[l+1]))
        A = m.As[l]
        x = σ.( V * A * Λ * x )
    end
    Λ = diagm(exp.(m.ds[L]))
    A = m.As[L]
    return A * Λ *x
end
Flux.@functor Multi

m = Multi([randn(nh,ni), randn(no,nh)],[randn(ni), randn(nh)])

x = 0
y = 1

m(1)  # check that the model evaluates

function loss(x,y)
    ŷ = m(x)
    sum((y .- ŷ)^2)
end


grads = gradient(() -> loss(x, y), params(m))
grads[1]

ToucheSir · July 7, 2021, 4:06am

Chain uses a tuple, can you do the same?

struct Multi{T1<:Tuple,T2<:Tuple}
    As::T1
    ds::T2
end

m = Multi((randn(nh,ni), randn(no,nh)), (randn(ni), randn(nh)))

AFAICT Multi doesn’t need to be mutable either.

IRM · July 7, 2021, 4:39am

~~Thanks for the suggestion, but unfortunately this gives the same error: Only reference types can be differentiated with Params .~~

Edit: actually it does seem to work, I was doing something silly before. Thanks for the help!

Topic		Replies	Views
Flux.params does not recognize parameters with `x -> layer(x)` syntax Machine Learning flux	4	1160	September 18, 2020
Stacking layers example Flux - Flux.params empty New to Julia question	2	833	December 12, 2019
Re-using layers in Flux.jl: how to train a multi-layer model sharing a common LSTM layer and separate dense layers? Machine Learning	1	618	June 15, 2022
Custom layer with new parameters in Flux.jl Machine Learning flux	3	1540	December 11, 2019
Flux; `params` cannot track any parameters of networks when building networks without `Chain` General Usage flux	6	607	December 31, 2020

Flux: How to create a custom multi-layer model with some parameters shared across layers?

Related topics