Propose for a movable SubArray

proposal

#1

As is known views themselves need memory to hold some information. The fact that views are often used as temporaries leads to unnecessary (de)allocations when the same view is used multiple times. Moreover, this behavior unfavors the use of views in scenarios when views are all of the same shape, only differ in offsets (think 2d convolution for example, and in general any moving window algorithms).

The situation can be mitigated if we can set! the offset of views (a.k.a. SubArrays), preferably with or without boundchecking:

move_unsafe!(x::SubArray, diff_in_cartian_offsets...) = x.offset1 = ???
move!(x::SubArray, diff_in_cartian_offsets...) = boundcheck && x.offset1 = ???

So to iterate through column of an x::Array{T,2} I can just type

v = view(x, :, 1)
for i in 1:size(x)[end]
    move!(x, 0, 1)
    ...use x...
end

with the following benefits:

  • v only get allocated once;
  • no hand unrolling of loops like now when we find the overhead of x[:,i] unacceptable.

AFAIK this can not be non-breakingly implemented now since SubArrays are immutable.
So do you think a movable SubArray is the right way to go?

=====================================================
If you are not convinced of the performance hit here is a illustrative benchmark.

function conv2d_view(x::Array{T,2}, k::Int) where T
    ni, nj = collect(size(x)) - k + 1
    y = fill(zero(T), ni, nj)
    km1 = k-1
    for j in 1:nj
        for i in 1:ni
            y[i,j] = sum(view(x, i:i+km1, j:j+km1))
        end
    end
    return y
end
function conv2d_loop(x::Array{T,2}, k::Int) where T
    ni, nj = collect(size(x)) - k + 1
    y = fill(zero(T), ni, nj)
    km1 = k-1
    for j in 1:nj
        for i in 1:ni
            for n in 0:km1
                for m in 0:km1
                    y[i,j] += x[i+m, j+n]
                end
            end
        end
    end
    return y
end

using BenchmarkTools
n = 1000
x = randn(n, n)
k = 3

julia> @benchmark conv2d_view($x, $k)
BenchmarkTools.Trial: 
  memory estimate:  68.41 MiB
  allocs estimate:  996179
  --------------
  minimum time:     25.539 ms (6.65% GC)
  median time:      25.768 ms (6.88% GC)
  mean time:        26.733 ms (9.59% GC)
  maximum time:     100.454 ms (61.82% GC)
  --------------
  samples:          187
  evals/sample:     1

julia> @benchmark conv2d_loop($x, $k)
BenchmarkTools.Trial: 
  memory estimate:  7.62 MiB
  allocs estimate:  175
  --------------
  minimum time:     16.466 ms (0.00% GC)
  median time:      16.851 ms (0.00% GC)
  mean time:        17.325 ms (2.45% GC)
  maximum time:     64.073 ms (73.48% GC)
  --------------
  samples:          289
  evals/sample:     1

julia> versioninfo()
Julia Version 0.7.0-DEV.4722
Commit 8a5f747 (2018-03-29 19:53 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

Simulate Tuple{Vararg{T1,N},T2} where{T1, N, T2}
#2

Did you try v0.7 (recent master)? The compiler is becoming eerily really clever about memory management for small transient immutables.


#3

FWIW, putting @inbounds on the loop function will likely make it perform significantly better.

I think (in 0.7) the view allocations are only elided if they are only used in functions that end up inlined.


#4

post updated with a simple benchmark on the latest master


#5

I did run that test. On my node I had (view vs loop) ~ (15ms vs 8ms). I just refrained from posting that for simplicity.


#6

Just for what it’s worth, the long-term goal with SubArray has been to improve GC support for immutables that contain mutable objects — that should make it able to be passed as a stack-allocated value and avoid this issue. That’s https://github.com/JuliaLang/julia/pull/18632.

And then there’s also a potential for “mutating” immutables — https://github.com/JuliaLang/julia/pull/21912.

I don’t think making SubArray mutable is going to be the direction we want to take — it’s not breaking now but in the future we’ll have better support for immutables that could make it even faster. And then going back to immutable will be breaking.


#7

Thank you for the explanation, and the links are immensely enriching!