Hi all,
I’m somewhat new to Julia and Flux, and trying to train a model similar to a standard dense multi-layer neural network, but with sharing of some trainable parameters between layers.
To give a concrete example (not exactly what I want, but its close enough to illustrate the problem I’m facing in Flux):
The model has as parameters a sequence of matrices A_l and \Lambda_l, with the latter diagonal and positive-definite.
For layers l=1, ... , L-1:
x_{l+1} = \sigma (\Lambda_{l+1}^{-1} A_l \Lambda_l x_l)
and a final output layer
y = A_L \Lambda_L x_L
The main difficulty is that each matrix \Lambda_l for l=2, ..., L appears in both layer l and layer l-1. For this reason I cannot just use Chain, at least as far as I know.
What is the best way of coding this in Flux?
I have tried the code below. It works until the last line which gives the error
ERROR: Only reference types can be differentiated with
Params
.
I have searched for this error and no solutions I found address exactly this problem. I understand that the problem is related to having a vector of arrays in the struct defining the model. But is there a better way of representing such a structure with a flexible number of layers? Or is there a way of getting Flux to differentiate with respect to the arrays As and ds?
Grateful for any assistance!
using Flux
mutable struct Multi
As::Vector{Array{Float64}}
ds::Vector{Array{Float64}}
end
function (m::Multi)(x)
L = length(m.As)
for l = 1:(L-1)
Λ = diagm(exp.(m.ds[l]))
V = diagm(exp.(-m.ds[l+1]))
A = m.As[l]
x = σ.( V * A * Λ * x )
end
Λ = diagm(exp.(m.ds[L]))
A = m.As[L]
return A * Λ *x
end
Flux.@functor Multi
m = Multi([randn(nh,ni), randn(no,nh)],[randn(ni), randn(nh)])
x = 0
y = 1
m(1) # check that the model evaluates
function loss(x,y)
ŷ = m(x)
sum((y .- ŷ)^2)
end
grads = gradient(() -> loss(x, y), params(m))
grads[1]