Hi all,

I’m rewriting a function from R to Julia that reduces an array across the 4th dimension via LogSumExp, and I’m currently trying to optimise the speed. I’ve made a bit of progress through the Tullio package, but I’m sure there is more speed to be got.

Here are my attempts so far with toy data:

```
using BenchmarkTools
using Tullio
function logsumexp4D1(Arr4d::Array{Float64, 4})
max_ = maximum(Arr4d, dims = (4))
lse = max_ + log.( sum( exp.(Arr4d .- max_), dims = (4)) )
dropdims(lse, dims = 4)
end
function logsumexp4D2(Arr4d::Array{Float64, 4})
max_ = dropdims( maximum(Arr4d, dims = (4)), dims=4)
@tullio sum_exp[i,j,k] := exp(Arr4d[i,j,k,l] - max_[i,j,k])
max_ + log.(sum_exp)
end
function logsumexp4D3(Arr4d::Array{Float64, 4})
@tullio (max) max_[i,j,k] := Arr4d[i,j,k,l]
@tullio sum_exp[i,j,k] := exp(Arr4d[i,j,k,l] - max_[i,j,k])
max_ + log.(sum_exp)
end
function logsumexp4D4(Arr4d::Array{Float64, 4})
@tullio (max) max_[i,j,k] := Arr4d[i,j,k,l]
@tullio sum_exp[i,j,k] := exp(Arr4d[i,j,k,l] - max_[i,j,k])
@tullio lse[i,j,k] := max_[i,j,k] + log(sum_exp[i,j,k])
end
A = rand(20, 1000, 80, 5)
julia> @btime logsumexp4D1($A);
90.804 ms (50 allocations: 109.86 MiB)
julia> @btime logsumexp4D2($A);
37.490 ms (152 allocations: 48.83 MiB)
julia> @btime logsumexp4D3($A);
24.381 ms (237 allocations: 48.84 MiB)
julia> @btime logsumexp4D4($A);
13.715 ms (336 allocations: 36.64 MiB)
julia> logsumexp4D1(A) == logsumexp4D2(A) == logsumexp4D3(A) == logsumexp4D4(A)
false
julia> logsumexp4D1(A) == logsumexp4D2(A) == logsumexp4D3(A)
true
julia> isapprox(logsumexp4D1(A), logsumexp4D4(A))
true
```

Ok so that is some progress, but I’m a novice and I’m sure folks who know more can beat those times. One thing of note is that for function4 the results are not identical, but are approximately equal (which is good enough). Strangely though, the better performance of function 4 does not translate to R:

Any comments or suggestions welcome

Thanks everyone!