I would like to create Parametric ReLU (PReLU), an activation function, that is described in https://arxiv.org/abs/1502.01852
I know I should use
using Flux.Tracker
but I am a bit lost. My major challenge here is that for each layer of the network I need the trainable parameter ‘a’ of PReLU to be shared across the activations in that layer. So if the network has say 10 layers, then only 10 scalar trainable parameters should be added as a result (one for each layer).
Any ideas?