I need to apply a softmax layer to values that are potentially large and I am getting NaN
values. Playing around a bit more I found the following:
julia> using CUDA, Flux
julia> softmax(gpu(Float32[1, 2, 3, Inf]))
4-element CuArray{Float32,1}:
0.0
0.0
0.0
NaN
julia> softmax(Float32[1, 2, 3, Inf])
4-element Array{Float32,1}:
0.0
0.0
0.0
1.0
For comparison, both tensorflow and pytorch return [nan, nan, nan, nan]
for this on cpu and gpu. Is this a bug? What is the âcorrectâ implementation?