Since in Flux, all layers and activation functions are just functions, this is straightforward to implement.
Let’s say we’re working with a Dense
layer with a relu
activation. Now usually you would construct your layer with Dense(nin, nout, relu)
for relu
to be applied to every output of the Dense
layer.
We can write a custom activation layer that applies a regular activation function to all but the first output as follows:
struct PartialActivation
activationfn
end
(pa::PartialActivation)(xs) = map((i, x) -> i == 1 ? x : pa.activationfn(x), eachindex(xs), xs)
This will apply the activationfn
to all but the first element.
To use it in a model, you will have to switch from something that probably looks like
Chain(.., Dense(10, 10, relu), ...)
to
Chain(.., Dense(10, 10), PartialActivation(relu) ...)
Hope this helps and feel free to ask questions!