I need to apply a softmax layer to values that are potentially large and I am getting `NaN`

values. Playing around a bit more I found the following:

```
julia> using CUDA, Flux
julia> softmax(gpu(Float32[1, 2, 3, Inf]))
4-element CuArray{Float32,1}:
0.0
0.0
0.0
NaN
julia> softmax(Float32[1, 2, 3, Inf])
4-element Array{Float32,1}:
0.0
0.0
0.0
1.0
```

For comparison, both tensorflow and pytorch return `[nan, nan, nan, nan]`

for this on cpu and gpu. Is this a bug? What is the âcorrectâ implementation?