Speed Comparison Python v Julia for custom layers

Despite what some libraries say, softmax is not an activation function in the same way that e.g. relu is. It operates on an entire array at once. Hence it’s a separate operation/function which is not incorporated into the previous layer’s forward pass. NNlib’s softmax function is already “fast” in that sense because it already does some degree of numerical approximation.

All ML libraries do this, because each layer depends on the output of the previous one. I can’t think of a model where that isn’t the case (note: even skip connections and other branching all runs sequentially), which is another reason for you to share more about the one you’re trying to write :wink:

1 Like