How to speed up `Statistics.median` along a given dimension?

bensetterholm · January 11, 2020, 12:04am

I am in the process of translating a Python function into Julia. I was expecting the latter to be faster than the former, but to my surprise the Python function outperformed my Julia version each time. Using the Profile module, I discovered that my Julia function was spending >30% of its time computing a median value along the first dimension in a 4-D array (passed into the function as one of the arguments).

Benchmarking this bottleneck yielded some surprising (to me) results. Obviously using the stock Statistics.median function with the dims parameter is not the way to go here. How might I go about speeding up this computation, getting it down to the Python performance, or better?

Annotated Example:

First I generate a random 4D array and compute the median, iterating along the same axis in Python/Julia, and testing that the values are indeed the same.

julia> using PyCall, Statistics, BenchmarkTools

julia> py"""
       import numpy as np
       a = np.random.rand(10,10,10,10).astype(np.float32)
       median_py = np.median(a, axis=3)
       """

julia> a = permutedims(py"a", 4:-1:1);

julia> median_py = permutedims(py"median_py", 3:-1:1)[[CartesianIndex()],:,:,:];

julia> median_jl = median(a, dims=1);

julia> median_py == median_jl
true

Next, I run those computations again with the @benchmark macro. The Julia version takes over 4x more time to compute.

julia> @benchmark py"np.median(a, axis=3)"
BenchmarkTools.Trial: 
  memory estimate:  5.64 KiB
  allocs estimate:  33
  --------------
  minimum time:     310.933 μs (0.00% GC)
  median time:      436.488 μs (0.00% GC)
  mean time:        489.510 μs (0.00% GC)
  maximum time:     23.749 ms (0.00% GC)
  --------------
  samples:          9896
  evals/sample:     1

julia> @benchmark median(a, dims=1)
BenchmarkTools.Trial: 
  memory estimate:  319.61 KiB
  allocs estimate:  11071
  --------------
  minimum time:     1.352 ms (0.00% GC)
  median time:      1.418 ms (0.00% GC)
  mean time:        1.679 ms (1.61% GC)
  maximum time:     8.195 ms (67.38% GC)
  --------------
  samples:          2963
  evals/sample:     1

Next, I recreate a “dumb” version of the dims argument with a comprehension, testing to make sure it yields the same result. This is almost twice as fast as the provided Julia method, but still significantly slower than Python.

julia> median_jl_dumb = begin
           w,x,y,z = size(a)
           [median(a[:,i,j,k]) for i in 1:x, j in 1:y, k in 1:z][[CartesianIndex()],:,:,:]
       end;

julia> median_jl == median_jl_dumb
true

julia> @benchmark begin
           w,x,y,z = size(a)
           [median(a[:,i,j,k]) for i in 1:x, j in 1:y, k in 1:z][[CartesianIndex()],:,:,:]
       end
BenchmarkTools.Trial: 
  memory estimate:  306.38 KiB
  allocs estimate:  4047
  --------------
  minimum time:     722.054 μs (0.00% GC)
  median time:      728.818 μs (0.00% GC)
  mean time:        780.170 μs (1.53% GC)
  maximum time:     3.396 ms (68.06% GC)
  --------------
  samples:          6377
  evals/sample:     1

Finally, I make sure there is nothing “wrong” with the Julia Statistics median function in general by comparing it to numpy without any axis slicing, and indeed it is comperable/faster in this case

julia> @benchmark py"np.median(a)"
BenchmarkTools.Trial: 
  memory estimate:  176 bytes
  allocs estimate:  5
  --------------
  minimum time:     181.190 μs (0.00% GC)
  median time:      186.564 μs (0.00% GC)
  mean time:        199.499 μs (0.00% GC)
  maximum time:     1.331 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark median(a)
BenchmarkTools.Trial: 
  memory estimate:  39.16 KiB
  allocs estimate:  3
  --------------
  minimum time:     131.974 μs (0.00% GC)
  median time:      152.517 μs (0.00% GC)
  mean time:        166.348 μs (3.97% GC)
  maximum time:     9.239 ms (98.08% GC)
  --------------
  samples:          10000
  evals/sample:     1

stillyslalom · January 11, 2020, 5:25am

median(A, dims) dispatches to mapslices, which is flexible but somewhat slow. My own profiling shows that much of the runtime is spent in an auxiliary function, concatenate_setindex!, which may be hobbled by broadcasting overhead. JuliennedArrays delivers a substantial speedup:

using JuliennedArrays

A = rand(Float32, 10, 10, 10, 10)
@btime median($A, dims=1)         # 1.251 ms (11070 allocations: 319.59 KiB)
@btime map(median, Slices($A, 1)) # 230.976 μs (2018 allocations: 192.38 KiB)

dropdims(median(A, dims=1), dims=1) == map(median, Slices(A, 1)) # true

**edit: just for fun, threaded mapslices

function tmapslices(f, A, dims...)
    sliced = Slices(A, dims...)
    return_type = typeof(f(A[1:2]))
    out = Array{return_type}(undef, size(sliced))

    Threads.@threads for i in eachindex(sliced)
        out[i] = f(sliced[i])
    end
    
    return out
end

@btime tmapslices(median, $A, 1) # 142.063 μs (238 allocations: 57.70 KiB)

baggepinnen · January 11, 2020, 6:15am

Wow! I was just sitting taking median over images and was puzzled over how slow it was, for typical images sizes, the speedup is enourmous!

A = rand(Float32, 1000, 1000, 10)
@btime median($A, dims=3)         # 1.884 s (14912038 allocations: 338.17 MiB)
@btime map(median, Slices($A, 3)) # 283.797 ms (2000019 allocations: 186.92 MiB)

stillyslalom · January 11, 2020, 6:28am

Nice! I necro’d a related issue (https://github.com/JuliaLang/julia/issues/28431).

bensetterholm · January 11, 2020, 11:54am

Fantastic, thanks a lot! It would be nice if this were eventually the default code under the hood for the dims keyword.

If I need those singleton dimensions remaining, is there a “cleaner” way to do this than

map(median, Slices(A, 1))[[CartesianIndex()],:,:,:]

As the [CartesianIndex()] is rather ugly

baggepinnen · January 11, 2020, 12:25pm

You can reshape the output, it does not cost anything.

bensetterholm · January 11, 2020, 12:34pm

Interesting. I didn’t realize the previous syntax was allocating a new array (that’s actually kind of annoying), but I guess it makes sense.

If I want to reshape the first or last index, it is relatively trivial with say

reshape(a, (1, size(a)...))

However, there are several times where I need to insert a singleton dimension as a middle dimension. Is there a way to make the following in one line?

w,x,y,z = size(a)
reshape(a, (w,x,y,1,z))

(I realize I’m veering off topic here)

stillyslalom · January 11, 2020, 5:08pm

It ain’t pretty, but this works: reshape(A, insert!([size(A)...], 4, 1)...)

mcabbott · January 11, 2020, 8:54pm

Also relevant is add Slices array type for eachslice/eachrow/eachcol by simonbyrne · Pull Request #32310 · JuliaLang/julia · GitHub, which would allow map(median, eachslice(A, dims=(1,2)) instead of median($A, dims=3), I think.

bensetterholm · January 11, 2020, 9:06pm

It looks as though eachslice can only accept a single dimension.

ERROR: ArgumentError: only single dimensions are supported
Stacktrace:
 [1] #eachslice#178 at ./abstractarraymath.jl:452 [inlined]
 [2] (::Base.var"#kw##eachslice")(::NamedTuple{(:dims,),Tuple{Tuple{Int64,Int64,Int64}}}, ::typeof(eachslice), ::Array{Float32,4}) at ./none:0
 [3] top-level scope at REPL[15]:1

Topic		Replies	Views
Base.Statistics -- I'm confused by the differences in arguments General Usage	13	847	August 14, 2020
General reduction performance Performance	4	1120	October 3, 2018
View way slower than copy? Performance images	7	2550	December 14, 2017
Rewriting Python code using Images.jl General Usage question , images , python	2	549	April 10, 2020
Avoid allocation of small arrays in median, extrema, etc Performance memory-allocation	23	1120	March 28, 2021

How to speed up `Statistics.median` along a given dimension?

Related topics