I had to define
function logsumexp_avx(mat; dims=1)
@assert dims == 1
max_ = vec(fast_max(mat))' # requires dims=1
exp_mat = @avx exp.(mat .- max_) .- (mat .== max_)
sum_exp_ = sum(exp_mat, dims=dims)
@avx sum_exp_ .= log1p.(sum_exp_) .+ max_
end
(i.e., remove the dims = 1 argument from the call to fast_max)
Also, more definitions are needed for the gradient to work. Did you define something like
LoopVectorization.vmaterialize(bc::Base.Broadcast.Broadcasted{<:ReverseDiff.TrackedStyle}, ::Val{_}) where {_} = Base.materialize(bc)
?