Creating Parametric ReLU in Flux

Azamat · May 18, 2018, 12:50pm

I would like to create Parametric ReLU (PReLU), an activation function, that is described in [1502.01852] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

I know I should use
using Flux.Tracker

but I am a bit lost. My major challenge here is that for each layer of the network I need the trainable parameter ‘a’ of PReLU to be shared across the activations in that layer. So if the network has say 10 layers, then only 10 scalar trainable parameters should be added as a result (one for each layer).

Any ideas?

jonathanBieler · May 22, 2018, 1:06pm

I’m not super familiar with Flux but this seems to work, I just modified a bit Flux’s Dense layer:

using Flux.Tracker, NNlib, Flux

struct DensePRELU{S,T,K}
    W::S
    b::T
    a::K
end

prelu(x,a) = x  > 0 ? x : a*x

function DensePRELU(in::Integer, out::Integer;
    initW = Flux.glorot_uniform, initb = zeros)
    return DensePRELU(param(initW(out, in)), param(initb(out)), param(0.0))
end

Flux.treelike(DensePRELU)

function (a::DensePRELU)(x)
    W, b, a = a.W, a.b, a.a
    NNlib.@fix prelu.(W*x .+ b, a)
end

m = Chain(
    DensePRELU(10, 2),
)

M = rand(2,10)
fake_data() = begin x=rand(10); y = M*x; (x,y) end
train = [fake_data() for i=1:100]

loss(x, y) = sum(abs2.(m(x) .- y))

opt = ADAM(params(m))
evalcb = () -> println( mean( loss(d...) for d in train) )

for i=1:10 Flux.train!(loss, train, opt, cb=evalcb) end

Maybe there’s a way to directly do it with the default Dense layer.

Azamat · May 28, 2018, 5:33am

@jonathanBieler, Did you try running this? This produces the following error in the current stable version of Flux (v0.5.1):

> opt = ADAM(params(m))

MethodError: Cannot `convert` an object of type Flux.Tracker.TrackedReal{Float64} to an object of type Flux.Optimise.Param
This may have arisen from a call to the constructor Flux.Optimise.Param(...),
since type constructors fall back to convert methods.
in ADAM at Flux/src/optimise/interface.jl:56
in optimiser at Flux/src/optimise/interface.jl:6
in collect at base/array.jl:476
in collect_to! at base/array.jl:518
in collect_to! at base/array.jl:508

jonathanBieler · May 28, 2018, 8:03am

It works on v0.4.1 yes, I’m not sure what’s going on with the new release but you can fix the error with this:

function DensePRELU(in::Integer, out::Integer;
    initW = Flux.glorot_uniform, initb = zeros)
    return DensePRELU(param(initW(out, in)), param(initb(out)), param(zeros(1)))
end

It defines a as a 1 element vector instead of a float.

Azamat · May 28, 2018, 9:07am

@MikeInnes, this behaviour in the version 0.5.1 seems to me as a bug. Or is it intended?

Azamat · May 28, 2018, 2:16pm

Thanks @jonathanBieler!
I’ve modified your solution below to abstract it away from a particular layer, so that now it can be called with BatchNorm for example.

using Flux
using Flux: Tracker, treelike
using NNlib

struct PReLU{T}
    a::T
end

PReLU(init::Real) = PReLU(param([init/1]))
PReLU() = PReLU(0.0)

treelike(PReLU)

prelu(x, a) = x > 0 ? x : a*x

function (f::PReLU)(x)
    NNlib.@fix prelu.(x, f.a)
end


m = Chain( Dense(10, 2), PReLU())

M = rand(2,10)
fake_data() = begin x=rand(10); y = M*x; (x,y) end
train = [fake_data() for i=1:100]

loss(x, y) = sum(abs2.(m(x) .- y))

opt = ADAM(params(m))
evalcb = () -> println( mean( loss(d...) for d in train) )

@time for i=1:10 Flux.train!(loss, train, opt, cb=evalcb) end

jonathanBieler · May 28, 2018, 3:22pm

That’s better yeah. I you find that this parametric relu is helping, maybe do a PR to Flux to add it.

cirobr · February 20, 2024, 7:14pm

Cheers, hope I can get back to this old topic.

In current Flux v0.14 there is the Scale layer. I wonder if prelu could be implemented like this?

prelu=Scale(1, relu, bias=false)

Thanks.

Yang-yang · February 1, 2025, 6:19pm

The parametric activation function can be realized in Lux.jl

model=@compact(w1=Dense(10, 2), a=[0.01f0]) do x
           out = leakyrelu.(w1(x),a)
           @return out
       end

The output gives

@compact(
    w1 = Dense(10 => 2),                # 22 parameters
    a = 1-element Vector{Float32},
) do x 
    out = leakyrelu.(w1(x), a)
    return out
end       # Total: 23 parameters,
          #        plus 0 states.

Topic		Replies	Views
How do I implement the Parametric ReLU (PReLU) function in Flux v0.11.1? Machine Learning flux , machine-learning	5	1393	September 10, 2020
Custom layer with new parameters in Flux.jl Machine Learning flux	3	1570	December 11, 2019
Lifting a Julia function into a Flux "layer" Machine Learning flux	7	2242	May 29, 2019
Stacking layers example Flux - Flux.params empty New to Julia question	2	847	December 12, 2019
How to use Flux.train! to train custom layer? Machine Learning question	2	1838	September 2, 2019

Creating Parametric ReLU in Flux

Related topics