Apply reduction along specific axes

What is a preferred way to apply a reduction (e.g., logsumexp) along specific axes of an array?

How would I apply a function along the last axis of an n-d array?

What is logsumexp and what should it do?

This was an example of a reduction - it’s in StatsFun, it does log (sum (exp (u))). If you want, just use sum instead.

I want to apply a reduction function along (possibly multiple) axes, where the axes are not known in advance (probably specified as a list passed into my function).

I hope I have explained more clearly.

You can use mapslices. The third argument specifies which dimensions you want to operate over. You can specify ndims(A) to use the last axis.

julia> A = rand(3,4,5)
       mapslices(u->log(sum(exp(u))), A, ndims(A))
3×4×1 Array{Float64,3}:
[:, :, 1] =
 2.11247  2.03308  2.12791  2.28064
 2.16582  2.05129  1.89066  2.14906
 2.14847  2.21091  2.05645  2.05114

Many of Julia’s aggregation functions have a second parameter specifying dimension(s) to operate on, e.g.:

julia> X = rand(3,4)
3×4 Array{Float64,2}:
 0.256141  0.785143  0.442656  0.112725
 0.395663  0.51446   0.156635  0.46549 
 0.902452  0.904167  0.567796  0.497947

julia> sum(X, 2)
3×1 Array{Float64,2}:

julia> sum(X, [1,2])
1×1 Array{Float64,2}:

You can use reducedim or mapreducedim.

1 Like

Thanks! I tried the following

using StatsFuns: logsumexp
import StatsFuns.logsumexp

function logsumexp(A::AbstractArray, Dims)
    return mapslices(logsumexp, A, Dims)

function test1(u, N)
    for _ in 1:N
        l = logsumexp(u, 2)
    return l

u = ones((10, 100000))
@time (test1(u,1000))
v = ones((100000, 10))
@time (test1(v, 1000))

The 1st benchmark is very good, timing identical to scipy logsumexp.
But the 2nd benchmark is much slower, about 4x slower than scipy!
Any suggestions?
I didn’t see how to use reducedim here, as it says it wants a binary function, and my logsumexp is n-ary.

SciPy arrays are row-major by default, so that the second dimension is contiguous. Julia arrays are column-major, so that the first dimension is contiguous. This means that reducing along the second dimension in SciPy has better memory locality (better cache efficiency) than in Julia.

To compare apples to apples, you should reduce along the second dimension in SciPy and along the first dimension (with transposed dimensions, i.e. 100000x10 in scipy and 10x100000 in Julia) in Julia.

1 Like

Doesn’t log(sum(exp, u, 2)) work? Or is it exp(sum(log, u, 2))?

Perhaps that would work if you actually calculate it that way, but I left some details out. The preferred, numerically stable way to calculate logsumexp(v) for a vector v is
m = max (v)
logsumexp(v) = m + log (sum (exp (v - m)))

AFAICT, max does not accept an axis argument, so I don’t think you can apply logsumexp directly along axes without something like mapslices (or a loop).

I guess you mean maximum, which does in fact accept an axis argument. max does something else.

I’m not sure how to port this numpy code to julia

def logsumexp_py2 (u, axes):
    m = np.max (u, axes)
    slices = [slice(m.shape[i]) if i not in axes else np.newaxis for i in range (len (u.shape))]
    m2 = m[slices]
    #print (m.shape, u.shape, m2)
    return m2 + np.log (np.sum (np.exp (u - m2), axes))

After taking max along the selected axes, we need to broadcast along all the reduced axes. In python, I formed the ‘slices’, which is a list that puts ‘newaxis’ into all axes that were reduced, so that we can do the operation:
u - max(u, axes)

How do I write this in julia?

What I’m trying to create is a logsumexp equivalent to that in scipy

To compute logsumexp of a 1-d vector,v, in a numerically stable way:
m = amax(v)
return m + log (sum (exp (v-m)))

Now what I need to do is extend this to apply along arbitrary axes argument
passed to the function:
logsumexp (A, axes)

This is analogous to sum (A, axes) which reduces (using +) the array A over
the axes specified.
I’m fine with the idea of writing low-level loops for better performance in
julia, and I think the logsumexp operating over a vector should be simple
to code, but not sure how to extend to the n-d array with arbitrary axes

What about this? You could still de-vectorize the reduction functions (sum and maximum), but it seems to perform pretty well:

       import StatsFuns
       using BenchmarkTools
       function logsumexp1(u, axes)
           m = maximum(u, axes)
           return m .+ log.(sum(exp.(u .- m), axes))
       logsumexp2(u, axes) = mapslices(StatsFuns.logsumexp, u, axes) # for comparison

julia> u = rand(10,100000);

julia> @btime logsumexp1($u, 2);
  22.964 ms (15 allocations: 7.63 MiB)

julia> @btime logsumexp2($u, 2)
  33.472 ms (138 allocations: 786.13 KiB)

julia> u = rand(100000,10);

julia> @btime logsumexp1($u, 2);
  25.086 ms (18 allocations: 9.92 MiB)

julia> @btime logsumexp2($u, 2);
  223.134 ms (1099527 allocations: 25.17 MiB)

You don’t need to add axes with newaxis since Julia’s reductions preserve the number of dimensions.

wow! amazing! Now I have
import StatsFuns
using BenchmarkTools
function logsumexp1(u, axes)
m = maximum(u, axes)
return m .+ log.(sum(exp.(u .- m), axes))
u = rand(10,100000);
v = rand(100000,10);
@btime logsumexp1($u, 2);
@btime logsumexp1($u, 1);
@btime logsumexp1($v, 2);
@btime logsumexp1($v, 1);
15.428 ms (15 allocations: 7.63 MiB)
22.213 ms (8 allocations: 9.92 MiB)
17.118 ms (18 allocations: 9.92 MiB)
15.849 ms (5 allocations: 7.63 MiB)

For comparison with scipy, I get:
%timeit logsumexp(u,0)
22.2 ms ± 98.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [103]: %timeit logsumexp(u,1)
17.7 ms ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [107]: %timeit logsumexp(v,0)
20.7 ms ± 209 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [108]: %timeit logsumexp(v,1)
24.9 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)