Creating Parametric ReLU in Flux

I would like to create Parametric ReLU (PReLU), an activation function, that is described in [1502.01852] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

I know I should use
using Flux.Tracker

but I am a bit lost. My major challenge here is that for each layer of the network I need the trainable parameter ‘a’ of PReLU to be shared across the activations in that layer. So if the network has say 10 layers, then only 10 scalar trainable parameters should be added as a result (one for each layer).

Any ideas?

I’m not super familiar with Flux but this seems to work, I just modified a bit Flux’s Dense layer:

using Flux.Tracker, NNlib, Flux

struct DensePRELU{S,T,K}
    W::S
    b::T
    a::K
end

prelu(x,a) = x  > 0 ? x : a*x

function DensePRELU(in::Integer, out::Integer;
    initW = Flux.glorot_uniform, initb = zeros)
    return DensePRELU(param(initW(out, in)), param(initb(out)), param(0.0))
end

Flux.treelike(DensePRELU)

function (a::DensePRELU)(x)
    W, b, a = a.W, a.b, a.a
    NNlib.@fix prelu.(W*x .+ b, a)
end

m = Chain(
    DensePRELU(10, 2),
)

M = rand(2,10)
fake_data() = begin x=rand(10); y = M*x; (x,y) end
train = [fake_data() for i=1:100]

loss(x, y) = sum(abs2.(m(x) .- y))

opt = ADAM(params(m))
evalcb = () -> println( mean( loss(d...) for d in train) )

for i=1:10 Flux.train!(loss, train, opt, cb=evalcb) end

Maybe there’s a way to directly do it with the default Dense layer.

2 Likes

@jonathanBieler, Did you try running this? This produces the following error in the current stable version of Flux (v0.5.1):

> opt = ADAM(params(m))

MethodError: Cannot `convert` an object of type Flux.Tracker.TrackedReal{Float64} to an object of type Flux.Optimise.Param
This may have arisen from a call to the constructor Flux.Optimise.Param(...),
since type constructors fall back to convert methods.
in ADAM at Flux/src/optimise/interface.jl:56
in optimiser at Flux/src/optimise/interface.jl:6
in collect at base/array.jl:476
in collect_to! at base/array.jl:518
in collect_to! at base/array.jl:508

It works on v0.4.1 yes, I’m not sure what’s going on with the new release but you can fix the error with this:

function DensePRELU(in::Integer, out::Integer;
    initW = Flux.glorot_uniform, initb = zeros)
    return DensePRELU(param(initW(out, in)), param(initb(out)), param(zeros(1)))
end

It defines a as a 1 element vector instead of a float.

@MikeInnes, this behaviour in the version 0.5.1 seems to me as a bug. Or is it intended?

Thanks @jonathanBieler!
I’ve modified your solution below to abstract it away from a particular layer, so that now it can be called with BatchNorm for example.

using Flux
using Flux: Tracker, treelike
using NNlib

struct PReLU{T}
    a::T
end

PReLU(init::Real) = PReLU(param([init/1]))
PReLU() = PReLU(0.0)

treelike(PReLU)

prelu(x, a) = x > 0 ? x : a*x

function (f::PReLU)(x)
    NNlib.@fix prelu.(x, f.a)
end


m = Chain( Dense(10, 2), PReLU())

M = rand(2,10)
fake_data() = begin x=rand(10); y = M*x; (x,y) end
train = [fake_data() for i=1:100]

loss(x, y) = sum(abs2.(m(x) .- y))

opt = ADAM(params(m))
evalcb = () -> println( mean( loss(d...) for d in train) )

@time for i=1:10 Flux.train!(loss, train, opt, cb=evalcb) end
1 Like

That’s better yeah. I you find that this parametric relu is helping, maybe do a PR to Flux to add it.

1 Like

Cheers, hope I can get back to this old topic.

In current Flux v0.14 there is the Scale layer. I wonder if prelu could be implemented like this?

prelu=Scale(1, relu, bias=false)

Thanks.