Is there a way to guarantee 0 allocations when accessing an array?

ExpandingMan · May 17, 2017, 11:51pm

I have a whole bunch of functions which take as their arguments slices of an array. I’d like to execute these functions with 0 allocations (if that’s even possible).

Note the following:

f(x::AbstractArray) = x[1] + x[2]

x = rand(5)

f(x) # 0 allocations
f(x[1:2]) # 1 allocation
f(view(x, 1:2)) # 1 allocation

In light of the above, it would seem that one option would be to pass an array and the indices to access as arguments, however this gets pretty unpleasant as I have a whole class of functions that I want to execute and they, in general, take different numbers of elements from the array. Note also that since I’m dealing with tiny slices, view and direct indexing make similar allocations, so there’s very little advantage to using view.

yuyichao · May 18, 2017, 12:05am

It is possible for the allocation of the view to be optimized out (also note that allocating a view is much faster than allocating an array, even when the sizes are similar), at least when f is inlined (if f is not inlined, it’s a harder problem but still possible)

The compiler is not smart enough to elide the allocation in all cases yet though it is able to do that without bound checking already

julia> @inline f(x) = @inbounds return x[1] + x[2]
f (generic function with 1 method)

julia> g(x) = f(view(x, 1:2))
g (generic function with 1 method)

julia> x = rand(5)
5-element Array{Float64,1}:
 0.703975
 0.0720989
 0.406935
 0.933273
 0.629373

julia> using BenchmarkTools

julia> @btime g($x)
  2.654 ns (0 allocations: 0 bytes)
0.7760736339378733

ExpandingMan · May 18, 2017, 12:36am

Hm… this certainly works in these simple examples, but it seems rather tricky to get it to work consistently in my code. I have another way around this which is rather ugly, but foolproof (I hope), so I’ll take that.

Anyway thanks for the fix.

By the way, why are allocations happening in the case of views? I had a few ideas in my head as to why that might happen, but the fact that this is affected by @inbounds makes me suspect those ideas are wrong. For the small slices I am dealing with it certainly does not seem to be consistently true that the views are faster.

dlfivefifty · May 18, 2017, 12:41am

If you want to avoid allocations completely, speed is more important than safety, and you only need to consider bittypes, you can always use pointer arithmetic: that is, rewrite f to take in a Ptr.

I believe that the last time I asked about this I was told eventually it will be possible for SubArrays to be stack allocated (even though the underlying memory is still heap allocated), which will lower the overhead of having many views.

yuyichao · May 18, 2017, 12:44am

Because the view needs to be allocated

Because this makes the code easier to optimize.

FWIW

julia> using BenchmarkTools

julia> @inline f(x) = return x[1] + x[2]
f (generic function with 1 method)

julia> gv(x) = f(view(x, 1:2))
gv (generic function with 1 method)

julia> gi(x) = f(x[1:2])
gi (generic function with 1 method)

julia> x = rand(5)
5-element Array{Float64,1}:
 0.392123
 0.875388
 0.0173322
 0.556458
 0.532483

julia> @btime gv($x)
  8.648 ns (1 allocation: 48 bytes)
1.2675108514837523

julia> @btime gi($x)
  27.395 ns (1 allocation: 96 bytes)
1.2675108514837523

ExpandingMan · May 18, 2017, 12:55am

It might be the case that I don’t actually understand what a view is. When you say the view needs to be allocated, does that mean the pointer to the start of the view, and the length of the view need to be allocated? That would make sense, otherwise I’m confused.

Also, I should clarify: when I said “it does not seem to be consistently true that views are faster” I should have said “it does not seem consistently true that blindly replacing every getitem with view necessarily results in faster code”. I’m sure if I investigated this carefully I’d discover why.

Yes, I had already considered this. There is one other thing I’m going to try first (which basically amounts to changing the structure of a bunch of my functions in a slightly undesirable way) but it’s nice to know I always have that option. I want to be a little careful in case I decide to do something weird or crazy later on.

ChrisRackauckas · May 18, 2017, 12:58am

This is the PR to follow:

github.com/JuliaLang/julia

GC support for stack allocated objects

JuliaLang:jn/codegen_rewrite ← JuliaLang:ob/stknew

opened 06:56PM - 18 Jul 15 UTC

carnaval

+84017 -49754

Only works with copy_stacks for now. The code is not used anywhere in codegen ri…ght now, it can be tested by using (:stknew typ args...) instead of :new. It is essentially the GC side of #8134 but without the global alloca stack & unmarking, so @JeffBezanson will probably be happier. Stack objects are kept marked (+young) permanently and are only scanned when coming from the owning task. Note that this is on top of jn/codegen_rewrite under the "no codegen change policy" :-) cc @vtjnash

github.com/JuliaLang/julia

Want non-allocating array views

opened 12:03PM - 06 Feb 16 UTC

closed 02:28PM - 03 May 20 UTC

timholy

performance

I couldn't find an open issue for this frequently-noted problem, so I'm filing t…his. Feel free to close if there's an earlier one. It's worth linking to https://groups.google.com/forum/#!topic/julia-users/6Z1kJ5LrSMU, where it was pointed out that "faking" a view as `(A, R)`, where `R` is a `CartesianRange`, is currently the cleanest way to circumvent this problem (but can only handle the equivalent of `Int/UnitRange/Colon` indexes). Various solutions using `Ptr` have also been employed in the past, but of course those suffer from the standpoint of garbage-collection safety.

yuyichao · May 18, 2017, 12:59am

The view is an object

julia> view(x, 1:2)
2-element SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}:
 0.712483
 0.7216

And that object needs to be allocated. It has a few fields that includes the original array and the index. Possibly others too.

That is certainly not true. Or we would have switched the semantics. For

It should almost always be true though.

yuyichao · May 18, 2017, 1:00am

The PR with the most up-to-date info is WIP/RFC unbox more immutables by carnaval · Pull Request #18632 · JuliaLang/julia · GitHub

ExpandingMan · May 18, 2017, 1:04am

Right, just to be clear, that’s what I was saying.

tkoolen · May 18, 2017, 2:18am

As for the pointer option, I’ve been using an UnsafeVectorView struct (originally from ReverseDiffSparse) as a workaround. It implements the AbstractArray interface, but has only isbits fields (including the pointer), making it an isbits type itself.

ExpandingMan · May 18, 2017, 2:43am

That’s more or less what I thought SubArrays were in the first place (with bounds checking). Since it seems that’s not the case, this’ll be useful, thanks.

Stephen_Vavasis · May 19, 2017, 2:23am

Conceptually, the reason that view is boxed is as follows. Suppose v is a view of A. Then it is necessary for the data of A to remain alive as long as v is alive. So the run-time system needs to keep track of all live copies of v so that it knows eventually when it can deallocate A. If v were treated as a plain-bits type, then the run-time system would not be able to keep track of copies. The boxing can be eliminated if the compiler can prove to itself that all copies of the view go out of scope before A does. This is the optimization that the core developers are pursuing.

I would recommend the following performant way to apply functions to subarrays. If the subarray is relatively large, then use view because the overhead of the heap allocation and deallocation of the view is minor. If the subarray is small, then instead preallocate space for a copy c of the subarray, and then copy the subarray into c (using a dot operation, not colon subscripting), call your function on c, and then copy back afterwards with another dot operation.

Nitin_Arora · May 31, 2017, 10:32pm

So I have been following this discussion very keenly as I have millions of views and want them to be non-allocating.
I ran it in the following problem which I am not sure how to solve.
Why does the first one has no memory allocation while the second one allocates memory even though I am only multiplying by a scalar ?

julia> @benchmark @inbounds @views $work[4:6]  .=($work[1:3].*$t1 .- $f.*$r1v)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     14.205 ns (0.00% GC)
  median time:      15.256 ns (0.00% GC)
  mean time:        15.607 ns (0.00% GC)
  maximum time:     39.189 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark @inbounds @views $work[4:6]  .= $byr1.*($work[1:3].*$t1 .- $f.*$r1v)
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     24.847 ns (0.00% GC)
  median time:      25.410 ns (0.00% GC)
  mean time:        31.588 ns (15.02% GC)
  maximum time:     2.597 μs (97.16% GC)
  --------------
  samples:          10000
  evals/sample:     995

thanks

ChrisRackauckas · May 31, 2017, 10:39pm

Are you on v0.5? If that’s the case, then .* won’t fuse.

ExpandingMan · May 31, 2017, 11:00pm

One of us should really get around to writing a @nonallocating macro. It should be simple for a particular use case, but it might be complicated to write one which covers a wide variety of use cases. It would be nice to have something like that in Base.

Nitin_Arora · June 1, 2017, 12:01am

No this is on Julia 0.6. I filed a more detailed simplified issue here: Performance of broadcasting scalars over views is order-dependent · Issue #22165 · JuliaLang/julia · GitHub

Replicated below:

work = rand(6)
byr1 = 0.4
julia> @benchmark @inbounds @views $work[4:6]  .= $byr1.*$work[4:6]
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     18.073 ns (0.00% GC)
  median time:      18.703 ns (0.00% GC)
  mean time:        24.711 ns (20.62% GC)
  maximum time:     2.557 μs (97.40% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark @inbounds @views $work[4:6]  .= $work[4:6].*$byr1
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.319 ns (0.00% GC)
  median time:      8.599 ns (0.00% GC)
  mean time:        8.725 ns (0.00% GC)
  maximum time:     25.867 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

If I replace the work array view on the right hand side with a normal vector the issue goes away.

rdeits · June 1, 2017, 2:48am

The closest I have to a @nonallocating macro is the @wrappedallocs macro I use in some unit tests to verify that various things don’t allocate: https://github.com/rdeits/TypedPolynomials.jl/blob/master/test/runtests.jl

Basically, it uses @allocated to check allocations but does some magic to put everything inside a function so that it gives the right answers even at global scope.

oschulz · June 1, 2017, 12:45pm

Any idea when this could land? This would be so awesome.

Topic		Replies	Views
Array views becoming dominant source of memory allocation Performance	31	5170	November 14, 2018
Passing sub-array by reference / performance Performance	18	4728	April 10, 2019
Too many allocations when indexing with slices Performance indexing , memory-allocation	16	2749	August 17, 2018
Why does the following setindex call allocate? Performance indexing , array	6	470	June 12, 2023
Using `view` in 0.6 on a ragged array General Usage question	12	1071	July 22, 2017

Is there a way to guarantee 0 allocations when accessing an array?

Related topics