The problem with lsexp_mat is it allocates a lot of un-necessary matrices. If you re-write it as
function lsexp_mat1(mat; dims=1)
max_ = maximum(mat, dims=1)
sum_exp_ = sum(exp.(mat .- max_) .- mat .== max, dims=dims)
log1p.(sum_exp_) .+ max_
end
it’s about as fast as the vector version.