I should comment that the offset algorithm that you are using for logsumexp is potentially problematic — you can easily contrive a case where it is off by a factor of 2. In particular, try [1e-20, log(1e-20)]
with your logsumexp algorithm
function f(x)
X = maximum(x)
return X + log(sum(exp.(x .- X)))
end
You get:
julia> x = [1e-20, log(1e-20)];
julia> f(x) # inaccurate!
1.0e-20
julia> Float64(f(big.(x))) # accurate
1.9999999999999993e-20
A possible fix is to pull the maximum x
term out of the sum and use the log1p
function. I actually assigned this as an exam question recently, so you can see the explanation in my problem 3 solutions.
Not sure if this matters for the machine-learning application, however, since there you are adding the logsumexp to a posterior probability and so errors in tiny values like this may get rounded away in your final result.