View way slower than copy?

tlnagy · December 12, 2017, 8:58pm

I’m trying to compute the median of a 3D image along the :position axis, but I found copying the slice is way faster than a view? Why is that? The two functions below are identical expect for one has a copy operation versus a view operation

using AxisArrays
using Images
using BenchmarkTools

img = AxisArray(rand(1024, 1024, 9), :x, :y, :position)

function medianview(arr::AxisArray{T}) where {T}
    xydims = map(x->axisvalues(arr)[axisdim(arr, x)], (Axis{:y}, Axis{:x}))
    med = AxisArray(zeros(T, length.(xydims)...), :x, :y)
    for iter in CartesianRange(xydims)
        med[iter] = median(view(arr, iter, :))
    end
    med
end

function mediancopy(arr::AxisArray{T}) where {T}
    xydims = map(x->axisvalues(arr)[axisdim(arr, x)], (Axis{:y}, Axis{:x}))
    med = AxisArray(zeros(T, length.(xydims)...), :x, :y)
    for iter in CartesianRange(xydims)
        med[iter] = median(arr[iter, :])
    end
    med
end

Any thoughts on why there is such a big difference between the two? And it appears as though view allocates more memory than copy, which is really weird.

@benchmark medianview(img)

BenchmarkTools.Trial: 
  memory estimate:  1.47 GiB
  allocs estimate:  62407718
  --------------
  minimum time:     16.194 s (1.41% GC)
  median time:      16.194 s (1.41% GC)
  mean time:        16.194 s (1.41% GC)
  maximum time:     16.194 s (1.41% GC)
  --------------
  samples:          1
  evals/sample:     1

@benchmark mediancopy(img)

BenchmarkTools.Trial: 
  memory estimate:  536.03 MiB
  allocs estimate:  10487846
  --------------
  minimum time:     611.938 ms (11.08% GC)
  median time:      637.819 ms (11.07% GC)
  mean time:        648.798 ms (13.06% GC)
  maximum time:     747.560 ms (25.30% GC)
  --------------
  samples:          8
  evals/sample:     1

kristoffer.carlsson · December 12, 2017, 9:17pm

Removing the call to median does not make a big difference in the timing. So it seems creating the view is the expensive part.

foobar_lv2 · December 12, 2017, 10:01pm

I think that AxisArray / your code tries to avoid straight loops too much. The following allocates for me on iter:

function medianview(arr::AxisArray{T}) where {T}
    xydims = map(x->axisvalues(arr)[axisdim(arr, x)], (Axis{:y}, Axis{:x}))
    med = AxisArray(zeros(T, length.(xydims)...), :x, :y)
    for iter in CartesianRange(xydims)
    end
    med
end

Next, you might try res=median(img, [3]) which does the same as your code (but only works on plain arrays).

This is still obnoxiously slow for me.

Also, do you really, really need to loop over the last dimension? You get 8x read amplification from this.

Ok, figured it out. Median is always pulling a copy.

img=rand(1024,1024,9)
@noinline function medianview_fast(arr::Array{T,3}) where {T}
    scratchpad = Vector{T}(size(arr,3))
    meds = Matrix{T}(size(arr,1), size(arr,2))
    for y=1:size(arr, 2)
        for x=1:size(arr,1)
            for z=1:size(arr,3)
                scratchpad[z]=arr[x,y,z]
            end
            meds[x,y]=median!(scratchpad)
        end
    end
    meds
end

median(img, [3])
@time median(img,[3])

  8.639759 seconds (40.91 M allocations: 904.221 MiB, 4.82% gc time)

median_fast(img)
@time median_fast(img)

  0.481699 seconds (1.05 M allocations: 24.000 MiB, 2.35% gc time)

tlnagy · December 13, 2017, 1:32am

The reason I was using the Cartesian Index because the order of dimensions aren’t guaranteed to be the same each time.

Also, do you really, really need to loop over the last dimension? You get 8x read amplification from this.

iter should only be looping over the first two dimensions, not the third.

andyferris · December 13, 2017, 12:29pm

Another thing to try is put @inbounds annotations around all your indexing and view-creating operations (and even iteration / for loops), and then compare.

(I was once surprised how expensive bounds checking is sometimes in constructing a view).

foobar_lv2 · December 13, 2017, 2:15pm

The innermost loop is for computing the median (which is hidden in a function call) and should therefore go over the first index. But even in correct order, I only got a 30% speedup over median_fast.

I fear that you will not get non-abysmal performance unless you implement a way to re-use the scratchpad array for median. Best would be if you fix median in base (it already provides the functionality you want, it is just bad at it).

I cannot say anything about your use of axis-arrays; never worked with them, but it looks like they cause additional problems.

Regarding views: Making a view is no use at all, since median pulls a copy. Just allow median to mutate the copy by calling median!, and save the view. But, as my code showed, re-using the copy gives ~20x speedup and ~40x less allocations.

kristoffer.carlsson · December 13, 2017, 2:31pm

I want to say again that the median is a red herring here. All there is to it this is that creating views of AxisArrays is apparently slow.

foobar_lv2 · December 14, 2017, 12:39pm

In some sense you’re right: The fact that views of AxisArrays are slow is weird.

On the other hand, using views is just bad for computing medians (at least if you want to use julia’s median, instead of a fancy self-built implementation).

And my comment about the median in statistics.jl was also incorrect. The true culprit is mapslices in AbstractArray.jl, which is terribly type-unstable (but it is smart enough to reuse arrays!).

Consider (on 0.7 master):

im = rand(1,1000,1000);
@time res = mapslices(sum, im, [1]);
  6.504686 seconds (44.71 M allocations: 873.019 MiB, 0.92% gc time)

im = rand(100_000,2,2);
@time res = mapslices(sum, im, [1]);
  0.000982 seconds (175 allocations: 786.922 KiB)

im = rand(100_000,1,1);
@time res = mapslices(sum, im, [1]);
  0.001141 seconds (79 allocations: 784.844 KiB)

Staring at the @code_warntype of mapslices for 5 minutes tells me that we should either accept an Ntuple and be a generated function, or just write straight looped code in an internal function, and have a type-unstable outer function that provides the official API and translates to the inner one.

Looking at the issue tracker, mapslices appears to be on the way out anyway, and not worth fixing / maintaining (might be replaced by JuliennedArray).

Topic		Replies	Views
When should you use views? Performance	9	2940	June 10, 2019
Too many allocations when indexing with slices Performance indexing , memory-allocation	16	2746	August 17, 2018
Preallocating the result of getindex(...) General Usage question	8	1389	July 13, 2017
How to speed up `Statistics.median` along a given dimension? Performance question , python	9	1509	January 11, 2020
How do I use `@view` correctly? General Usage	4	749	June 23, 2020

View way slower than copy?

Related topics