median(A, dims)
dispatches to mapslices
, which is flexible but somewhat slow. My own profiling shows that much of the runtime is spent in an auxiliary function, concatenate_setindex!
, which may be hobbled by broadcasting overhead. JuliennedArrays delivers a substantial speedup:
using JuliennedArrays
A = rand(Float32, 10, 10, 10, 10)
@btime median($A, dims=1) # 1.251 ms (11070 allocations: 319.59 KiB)
@btime map(median, Slices($A, 1)) # 230.976 μs (2018 allocations: 192.38 KiB)
dropdims(median(A, dims=1), dims=1) == map(median, Slices(A, 1)) # true
**edit: just for fun, threaded mapslices
function tmapslices(f, A, dims...)
sliced = Slices(A, dims...)
return_type = typeof(f(A[1:2]))
out = Array{return_type}(undef, size(sliced))
Threads.@threads for i in eachindex(sliced)
out[i] = f(sliced[i])
end
return out
end
@btime tmapslices(median, $A, 1) # 142.063 μs (238 allocations: 57.70 KiB)