Custom layer with new parameters in Flux.jl

Hello everyone,

I am new to Flux, but so far I really like it.
I was trying to train a neural network where I wanted to include the gain of the sigmoidal activation functions in the cost function, to penalise small gains.

To this end I tried to create a struct (which is almost identical to the Dense layer), but which also holds the gain parameter so that it can be easily differentiated by Flux (this code comes from the code for the Dense layer):

using Flux
using Flux: @treelike, glorot_uniform

struct GainLayer{F <: Function, S <: AbstractArray, T <: AbstractVector, U <: Number}

GainLayer(W, b, k) = GainLayer(W, b, k, identity)

@treelike GainLayer

function GainLayer(in::Integer, out::Integer, σ = identity;
               initW = glorot_uniform, initb = zeros, initk = 1)
    return GainLayer(initW(out, in), initb(out), initk, σ)

function (a::GainLayer)(x::AbstractArray)
    W, b, k, σ = a.W, a.b, a.k, a.σ
    σ.(k .* (W*x .+ b))

function activation(x)
    0.5 * (tanh(x) + 1)

Din = 10
Dhidden = 30
Dout = 5

test_model = Chain(
    GainLayer(Din, Dhidden, activation),
    GainLayer(Dhidden, Dout, activation)

However, when I do Flux.params(test_model), it finds no parameters of this net.

Could someone help me figure out what I’m doing wrong? Many thanks!

I don’t have the skill set to answer, but I’m very interested in the solution.

For the sake of completeness, I’m assuming that you’re (a) running the posted code, and then (b) simply executing Flux.params(test_model) to test the implementation. In other words, this is everything needed to replicate your problem?

Also, are you using Julia 1.3, Flux 0.10?

I am using Julia 1.3, Flux 0.9 (somehow my flux doesn’t update to 0.10).
Your guess was correct; this network is useless since Flux can’t find any parameters, therefore cannot adjust these parameters with backpropagation. I’m guessing I made some silly mistake thanks to my inexperience using Flux!

In Flux 0.10 @treelike is replaced by @functor, but as far as I could see this should not change functionality.


Iirc in Flux < 0.1 your parameters must be of type TrackedArray for them to be picked up by params. This is eaisily achieved by the method param:

function GainLayer(in::Integer, out::Integer, σ = identity;
               initW = glorot_uniform, initb = zeros, initk = 1)
    return GainLayer(param(initW(out, in)), param(initb(out)), param(initk), σ)

I also think that only arrays can be parameters (as Tracker uses mutation), meaning that k must be an array.