Speeding up my logsumexp function

I had to define

function logsumexp_avx(mat; dims=1) 
    @assert dims == 1
    max_ = vec(fast_max(mat))' # requires dims=1
    exp_mat = @avx exp.(mat .- max_) .- (mat .== max_) 
    sum_exp_ = sum(exp_mat, dims=dims)
    @avx sum_exp_ .= log1p.(sum_exp_) .+ max_
end

(i.e., remove the dims = 1 argument from the call to fast_max)

Also, more definitions are needed for the gradient to work. Did you define something like

LoopVectorization.vmaterialize(bc::Base.Broadcast.Broadcasted{<:ReverseDiff.TrackedStyle}, ::Val{_}) where {_} = Base.materialize(bc)

?

1 Like