How to apply an activation function to a subset of output units?

johnbb · January 10, 2020, 3:34pm

Using Flux I would like to apply an activation function to all but the first output unit. More precisely, I would like to constrain all but the first unit to be non-negative. How can I achieve that? Thanks.

holylorenzo · January 11, 2020, 4:30pm

Since in Flux, all layers and activation functions are just functions, this is straightforward to implement.

Let’s say we’re working with a Dense layer with a relu activation. Now usually you would construct your layer with Dense(nin, nout, relu) for relu to be applied to every output of the Dense layer.

We can write a custom activation layer that applies a regular activation function to all but the first output as follows:

struct PartialActivation
    activationfn
end
(pa::PartialActivation)(xs) = map((i, x) -> i == 1 ? x : pa.activationfn(x), eachindex(xs), xs)

This will apply the activationfn to all but the first element.

To use it in a model, you will have to switch from something that probably looks like

Chain(.., Dense(10, 10, relu), ...)

to

Chain(.., Dense(10, 10), PartialActivation(relu) ...)

Hope this helps and feel free to ask questions!

johnbb · January 13, 2020, 9:38am

Thanks, but it does not seem to work for arrays of dimension larger than one. In my case Flux expects 2-dimensional arrays (number of outputs x batch size). That is, the function should be applied to all but the first row.

I also consider to circumvent the problem by adding a sufficiently high number to the targets/responses such that the transformation can be applied to all output variables.

johnbb · February 20, 2023, 11:19am

I have since the previous posting three years ago avoided the problem by adding a sufficiently high number to the targets whenever needed, but I would now like to keep the targets as they are.

Is there a better way than the following?

s(x::AbstractArray{Float32,2}) = vcat(x[[1], :], softplus(x[2:end, :]))
Chain(...., Dense(10, 10), s)

ToucheSir · February 21, 2023, 12:34am

Looks pretty clean to me. You could save an allocation with x[1:1, :] instead of x[[1], :] and by converting the indexing to views (though I’m not sure about GPU compatibility there), but that’s very minor.

Topic		Replies	Views
Flux: different activation functions for different outputs General Usage flux	0	826	June 23, 2020
Individual ctivation function for each network output Machine Learning	1	183	August 11, 2023
FlexLayer: A Custom Layer with Different Activation Fcns, Non-negativity, and more New to Julia flux	0	558	July 3, 2020
Creating Parametric ReLU in Flux Machine Learning	8	2019	February 1, 2025
Flux output layer with custom activation function Machine Learning question , gpu , flux , type-stability	2	348	May 14, 2023

How to apply an activation function to a subset of output units?

Related topics