softmax there’s a difference between “potentially large” and infinite. In the latter case, as discussed in other replies, you have to special case the implementation and in the case of multiple infinities resort to conventions.
For “potentially large”, on the other hand, you will run into trouble with a naive implementation. E.g.
julia> x=Float32.([87, 88, 89, 90])
julia> exp.(x) ./ sum(exp.(x))
The canonical solution to this is to first subtract the largest value from all elements, as you can also see in the pasted code in another reply.
julia> y = x .- maximum(x)
julia> exp.(y) ./ sum(exp.(y))
This way all the exponentiated values are scaled proportionally so that the largest value is one and the overflow problems are gone. You might underflow the smaller elements but that has no practical consequence.