How to apply an activation function to a subset of output units?

Since in Flux, all layers and activation functions are just functions, this is straightforward to implement.

Let’s say we’re working with a Dense layer with a relu activation. Now usually you would construct your layer with Dense(nin, nout, relu) for relu to be applied to every output of the Dense layer.

We can write a custom activation layer that applies a regular activation function to all but the first output as follows:

struct PartialActivation
    activationfn
end
(pa::PartialActivation)(xs) = map((i, x) -> i == 1 ? x : pa.activationfn(x), eachindex(xs), xs)

This will apply the activationfn to all but the first element.

To use it in a model, you will have to switch from something that probably looks like

Chain(.., Dense(10, 10, relu), ...)

to

Chain(.., Dense(10, 10), PartialActivation(relu) ...)

Hope this helps and feel free to ask questions!