I’m trying to implement it in Flux based on the following post, but the version of Flux in this post is old and many of them are not available in v0.11.1.
The following code was created based on the official Flux documentation and forum posts.
struct PReLU_Dense
W
b
α
end
Flux.trainable(a::PReLU_Dense) = (a.W,a.b,a.α)
PReLU_Dense(in::Integer, out::Integer, α) = PReLU_Dense(randn(out, in), randn(out), α)
prelu(x, α) = x > 0 ? x : α*x
function (m::PReLU_Dense)(x)
prelu.(m.W * x .+ m.b, m.α)
end
Flux.@functor PReLU_Dense
m = Chain(PReLU_Dense(2, 4, 0.1),
Dense(4,1))
This code works, but the learning doesn’t take place well.
Therefore, we know it is incomplete.
My skill with the Julia language is still in its infancy and I don’t know what to do about it.
Let me know if you have any ideas.
It should be sufficient to create a regular Dense layer that has your prelu as its activation. Creating a new layer type on your own isn’t really necessary.
PReLU_Dense(n, m, α) = Dense(randn(m, n), randn(m), x->prelu(x, α))
Thanks.
I have confirmed that there is no need to create a new layer.
I’ll paste the code below to confirm.
Please let me know if you have any concerns.
struct PReLU_Dense
W
b
α
end
Flux.trainable(a::PReLU_Dense) = (a.α)
prelu(x, α) = x > 0 ? x : α*x
PReLU_Dense(n, m, α) = Dense(randn(m, n), randn(m), x-> prelu(x, α))
Flux.@functor PReLU_Dense
Is there any way to check the value of alpha as a model parameter?
I know that you can save the weight and other information from the following code, but I don’t know how to save the alpha information.
julia> using Flux
julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
julia> weights = params(model);
julia> using BSON: @save
julia> @save "mymodel.bson" weights
Sorry, @RiN, I’m afraid I completely misunderstood initially and led you the wrong way. If you want α to be a learnable parameter, you can’t just use Dense like I suggested. Your initial approach is correct.
The problem in learning seems to be that the Flux.params behavior is to return only mutable fields. See:
julia> struct PReLU_Dense
W; b; α
end
julia> Flux.@functor PReLU_Dense
julia> Flux.trainable(PReLU_Dense(1,2,3))
(W = 1, b = 2, α = 3)
julia> Flux.params(PReLU_Dense(1,2,3))
Params([])
Therefore I think the quickest thing to do in your case is to simply make α a length-1 vector so its value is mutable (rather than making the entire layer a mutable struct to accommodate it). You will need to modify the layer application function as, e.g.:
function (m::PReLU_Dense)(x)
prelu.(m.W * x .+ m.b, m.α[1])
end