Is there a way to guarantee 0 allocations when accessing an array?


#1

I have a whole bunch of functions which take as their arguments slices of an array. I’d like to execute these functions with 0 allocations (if that’s even possible).

Note the following:

f(x::AbstractArray) = x[1] + x[2]

x = rand(5)

f(x) # 0 allocations
f(x[1:2]) # 1 allocation
f(view(x, 1:2)) # 1 allocation

In light of the above, it would seem that one option would be to pass an array and the indices to access as arguments, however this gets pretty unpleasant as I have a whole class of functions that I want to execute and they, in general, take different numbers of elements from the array. Note also that since I’m dealing with tiny slices, view and direct indexing make similar allocations, so there’s very little advantage to using view.


#2

It is possible for the allocation of the view to be optimized out (also note that allocating a view is much faster than allocating an array, even when the sizes are similar), at least when f is inlined (if f is not inlined, it’s a harder problem but still possible)

The compiler is not smart enough to elide the allocation in all cases yet though it is able to do that without bound checking already

julia> @inline f(x) = @inbounds return x[1] + x[2]
f (generic function with 1 method)

julia> g(x) = f(view(x, 1:2))
g (generic function with 1 method)

julia> x = rand(5)
5-element Array{Float64,1}:
 0.703975
 0.0720989
 0.406935
 0.933273
 0.629373

julia> using BenchmarkTools

julia> @btime g($x)
  2.654 ns (0 allocations: 0 bytes)
0.7760736339378733

#3

Hm… this certainly works in these simple examples, but it seems rather tricky to get it to work consistently in my code. I have another way around this which is rather ugly, but foolproof (I hope), so I’ll take that.

Anyway thanks for the fix.

By the way, why are allocations happening in the case of views? I had a few ideas in my head as to why that might happen, but the fact that this is affected by @inbounds makes me suspect those ideas are wrong. For the small slices I am dealing with it certainly does not seem to be consistently true that the views are faster.


#4

If you want to avoid allocations completely, speed is more important than safety, and you only need to consider bittypes, you can always use pointer arithmetic: that is, rewrite f to take in a Ptr.

I believe that the last time I asked about this I was told eventually it will be possible for SubArrays to be stack allocated (even though the underlying memory is still heap allocated), which will lower the overhead of having many views.


#5

Because the view needs to be allocated

Because this makes the code easier to optimize.

FWIW

julia> using BenchmarkTools

julia> @inline f(x) = return x[1] + x[2]
f (generic function with 1 method)

julia> gv(x) = f(view(x, 1:2))
gv (generic function with 1 method)

julia> gi(x) = f(x[1:2])
gi (generic function with 1 method)

julia> x = rand(5)
5-element Array{Float64,1}:
 0.392123
 0.875388
 0.0173322
 0.556458
 0.532483

julia> @btime gv($x)
  8.648 ns (1 allocation: 48 bytes)
1.2675108514837523

julia> @btime gi($x)
  27.395 ns (1 allocation: 96 bytes)
1.2675108514837523

#6

It might be the case that I don’t actually understand what a view is. When you say the view needs to be allocated, does that mean the pointer to the start of the view, and the length of the view need to be allocated? That would make sense, otherwise I’m confused.

Also, I should clarify: when I said “it does not seem to be consistently true that views are faster” I should have said “it does not seem consistently true that blindly replacing every getitem with view necessarily results in faster code”. I’m sure if I investigated this carefully I’d discover why.

Yes, I had already considered this. There is one other thing I’m going to try first (which basically amounts to changing the structure of a bunch of my functions in a slightly undesirable way) but it’s nice to know I always have that option. I want to be a little careful in case I decide to do something weird or crazy later on.


#7

This is the PR to follow:


#8

The view is an object

julia> view(x, 1:2)
2-element SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}:
 0.712483
 0.7216

And that object needs to be allocated. It has a few fields that includes the original array and the index. Possibly others too.

That is certainly not true. Or we would have switched the semantics. For

It should almost always be true though.


#9

The PR with the most up-to-date info is https://github.com/JuliaLang/julia/pull/18632


#10

Right, just to be clear, that’s what I was saying.


#11

As for the pointer option, I’ve been using an UnsafeVectorView struct (originally from ReverseDiffSparse) as a workaround. It implements the AbstractArray interface, but has only isbits fields (including the pointer), making it an isbits type itself.


#12

That’s more or less what I thought SubArrays were in the first place (with bounds checking). Since it seems that’s not the case, this’ll be useful, thanks.


#13

Conceptually, the reason that view is boxed is as follows. Suppose v is a view of A. Then it is necessary for the data of A to remain alive as long as v is alive. So the run-time system needs to keep track of all live copies of v so that it knows eventually when it can deallocate A. If v were treated as a plain-bits type, then the run-time system would not be able to keep track of copies. The boxing can be eliminated if the compiler can prove to itself that all copies of the view go out of scope before A does. This is the optimization that the core developers are pursuing.

I would recommend the following performant way to apply functions to subarrays. If the subarray is relatively large, then use view because the overhead of the heap allocation and deallocation of the view is minor. If the subarray is small, then instead preallocate space for a copy c of the subarray, and then copy the subarray into c (using a dot operation, not colon subscripting), call your function on c, and then copy back afterwards with another dot operation.


#14

So I have been following this discussion very keenly as I have millions of views and want them to be non-allocating.
I ran it in the following problem which I am not sure how to solve.
Why does the first one has no memory allocation while the second one allocates memory even though I am only multiplying by a scalar ?

julia> @benchmark @inbounds @views $work[4:6]  .=($work[1:3].*$t1 .- $f.*$r1v)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     14.205 ns (0.00% GC)
  median time:      15.256 ns (0.00% GC)
  mean time:        15.607 ns (0.00% GC)
  maximum time:     39.189 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> @benchmark @inbounds @views $work[4:6]  .= $byr1.*($work[1:3].*$t1 .- $f.*$r1v)
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     24.847 ns (0.00% GC)
  median time:      25.410 ns (0.00% GC)
  mean time:        31.588 ns (15.02% GC)
  maximum time:     2.597 μs (97.16% GC)
  --------------
  samples:          10000
  evals/sample:     995

thanks


#15

Are you on v0.5? If that’s the case, then .* won’t fuse.


#16

One of us should really get around to writing a @nonallocating macro. It should be simple for a particular use case, but it might be complicated to write one which covers a wide variety of use cases. It would be nice to have something like that in Base.


#17

No this is on Julia 0.6. I filed a more detailed simplified issue here: https://github.com/JuliaLang/julia/issues/22165

Replicated below:

work = rand(6)
byr1 = 0.4
julia> @benchmark @inbounds @views $work[4:6]  .= $byr1.*$work[4:6]
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     18.073 ns (0.00% GC)
  median time:      18.703 ns (0.00% GC)
  mean time:        24.711 ns (20.62% GC)
  maximum time:     2.557 μs (97.40% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark @inbounds @views $work[4:6]  .= $work[4:6].*$byr1
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.319 ns (0.00% GC)
  median time:      8.599 ns (0.00% GC)
  mean time:        8.725 ns (0.00% GC)
  maximum time:     25.867 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

If I replace the work array view on the right hand side with a normal vector the issue goes away.


#18

The closest I have to a @nonallocating macro is the @wrappedallocs macro I use in some unit tests to verify that various things don’t allocate: https://github.com/rdeits/TypedPolynomials.jl/blob/master/test/runtests.jl

Basically, it uses @allocated to check allocations but does some magic to put everything inside a function so that it gives the right answers even at global scope.


#19

Any idea when this could land? This would be so awesome.