View way slower than copy?

I’m trying to compute the median of a 3D image along the :position axis, but I found copying the slice is way faster than a view? Why is that? The two functions below are identical expect for one has a copy operation versus a view operation

using AxisArrays
using Images
using BenchmarkTools

img = AxisArray(rand(1024, 1024, 9), :x, :y, :position)

function medianview(arr::AxisArray{T}) where {T}
    xydims = map(x->axisvalues(arr)[axisdim(arr, x)], (Axis{:y}, Axis{:x}))
    med = AxisArray(zeros(T, length.(xydims)...), :x, :y)
    for iter in CartesianRange(xydims)
        med[iter] = median(view(arr, iter, :))
    end
    med
end

function mediancopy(arr::AxisArray{T}) where {T}
    xydims = map(x->axisvalues(arr)[axisdim(arr, x)], (Axis{:y}, Axis{:x}))
    med = AxisArray(zeros(T, length.(xydims)...), :x, :y)
    for iter in CartesianRange(xydims)
        med[iter] = median(arr[iter, :])
    end
    med
end

Any thoughts on why there is such a big difference between the two? And it appears as though view allocates more memory than copy, which is really weird.

@benchmark medianview(img)
BenchmarkTools.Trial: 
  memory estimate:  1.47 GiB
  allocs estimate:  62407718
  --------------
  minimum time:     16.194 s (1.41% GC)
  median time:      16.194 s (1.41% GC)
  mean time:        16.194 s (1.41% GC)
  maximum time:     16.194 s (1.41% GC)
  --------------
  samples:          1
  evals/sample:     1
@benchmark mediancopy(img)
BenchmarkTools.Trial: 
  memory estimate:  536.03 MiB
  allocs estimate:  10487846
  --------------
  minimum time:     611.938 ms (11.08% GC)
  median time:      637.819 ms (11.07% GC)
  mean time:        648.798 ms (13.06% GC)
  maximum time:     747.560 ms (25.30% GC)
  --------------
  samples:          8
  evals/sample:     1

Removing the call to median does not make a big difference in the timing. So it seems creating the view is the expensive part.

I think that AxisArray / your code tries to avoid straight loops too much. The following allocates for me on iter:

function medianview(arr::AxisArray{T}) where {T}
    xydims = map(x->axisvalues(arr)[axisdim(arr, x)], (Axis{:y}, Axis{:x}))
    med = AxisArray(zeros(T, length.(xydims)...), :x, :y)
    for iter in CartesianRange(xydims)
    end
    med
end

Next, you might try res=median(img, [3]) which does the same as your code (but only works on plain arrays).

This is still obnoxiously slow for me.

Also, do you really, really need to loop over the last dimension? You get 8x read amplification from this.

Ok, figured it out. Median is always pulling a copy.

img=rand(1024,1024,9)
@noinline function medianview_fast(arr::Array{T,3}) where {T}
    scratchpad = Vector{T}(size(arr,3))
    meds = Matrix{T}(size(arr,1), size(arr,2))
    for y=1:size(arr, 2)
        for x=1:size(arr,1)
            for z=1:size(arr,3)
                scratchpad[z]=arr[x,y,z]
            end
            meds[x,y]=median!(scratchpad)
        end
    end
    meds
end

median(img, [3])
@time median(img,[3])

  8.639759 seconds (40.91 M allocations: 904.221 MiB, 4.82% gc time)

median_fast(img)
@time median_fast(img)

  0.481699 seconds (1.05 M allocations: 24.000 MiB, 2.35% gc time)

The reason I was using the Cartesian Index because the order of dimensions aren’t guaranteed to be the same each time.

Also, do you really, really need to loop over the last dimension? You get 8x read amplification from this.

iter should only be looping over the first two dimensions, not the third.

Another thing to try is put @inbounds annotations around all your indexing and view-creating operations (and even iteration / for loops), and then compare.

(I was once surprised how expensive bounds checking is sometimes in constructing a view).

The innermost loop is for computing the median (which is hidden in a function call) and should therefore go over the first index. But even in correct order, I only got a 30% speedup over median_fast.

I fear that you will not get non-abysmal performance unless you implement a way to re-use the scratchpad array for median. Best would be if you fix median in base (it already provides the functionality you want, it is just bad at it).

I cannot say anything about your use of axis-arrays; never worked with them, but it looks like they cause additional problems.

Regarding views: Making a view is no use at all, since median pulls a copy. Just allow median to mutate the copy by calling median!, and save the view. But, as my code showed, re-using the copy gives ~20x speedup and ~40x less allocations.

I want to say again that the median is a red herring here. All there is to it this is that creating views of AxisArrays is apparently slow.

In some sense you’re right: The fact that views of AxisArrays are slow is weird.

On the other hand, using views is just bad for computing medians (at least if you want to use julia’s median, instead of a fancy self-built implementation).

And my comment about the median in statistics.jl was also incorrect. The true culprit is mapslices in AbstractArray.jl, which is terribly type-unstable (but it is smart enough to reuse arrays!).

Consider (on 0.7 master):

im = rand(1,1000,1000);
@time res = mapslices(sum, im, [1]);
  6.504686 seconds (44.71 M allocations: 873.019 MiB, 0.92% gc time)

im = rand(100_000,2,2);
@time res = mapslices(sum, im, [1]);
  0.000982 seconds (175 allocations: 786.922 KiB)

im = rand(100_000,1,1);
@time res = mapslices(sum, im, [1]);
  0.001141 seconds (79 allocations: 784.844 KiB)

Staring at the @code_warntype of mapslices for 5 minutes tells me that we should either accept an Ntuple and be a generated function, or just write straight looped code in an internal function, and have a type-unstable outer function that provides the official API and translates to the inner one.

Looking at the issue tracker, mapslices appears to be on the way out anyway, and not worth fixing / maintaining (might be replaced by JuliennedArray).

1 Like