I have a whole bunch of functions which take as their arguments slices of an array. I’d like to execute these functions with 0 allocations (if that’s even possible).
In light of the above, it would seem that one option would be to pass an array and the indices to access as arguments, however this gets pretty unpleasant as I have a whole class of functions that I want to execute and they, in general, take different numbers of elements from the array. Note also that since I’m dealing with tiny slices, view and direct indexing make similar allocations, so there’s very little advantage to using view.
It is possible for the allocation of the view to be optimized out (also note that allocating a view is much faster than allocating an array, even when the sizes are similar), at least when f is inlined (if f is not inlined, it’s a harder problem but still possible)
The compiler is not smart enough to elide the allocation in all cases yet though it is able to do that without bound checking already
julia> @inline f(x) = @inbounds return x[1] + x[2]
f (generic function with 1 method)
julia> g(x) = f(view(x, 1:2))
g (generic function with 1 method)
julia> x = rand(5)
5-element Array{Float64,1}:
0.703975
0.0720989
0.406935
0.933273
0.629373
julia> using BenchmarkTools
julia> @btime g($x)
2.654 ns (0 allocations: 0 bytes)
0.7760736339378733
Hm… this certainly works in these simple examples, but it seems rather tricky to get it to work consistently in my code. I have another way around this which is rather ugly, but foolproof (I hope), so I’ll take that.
Anyway thanks for the fix.
By the way, why are allocations happening in the case of views? I had a few ideas in my head as to why that might happen, but the fact that this is affected by @inbounds makes me suspect those ideas are wrong. For the small slices I am dealing with it certainly does not seem to be consistently true that the views are faster.
If you want to avoid allocations completely, speed is more important than safety, and you only need to consider bittypes, you can always use pointer arithmetic: that is, rewrite f to take in a Ptr.
I believe that the last time I asked about this I was told eventually it will be possible for SubArrays to be stack allocated (even though the underlying memory is still heap allocated), which will lower the overhead of having many views.
It might be the case that I don’t actually understand what a view is. When you say the view needs to be allocated, does that mean the pointer to the start of the view, and the length of the view need to be allocated? That would make sense, otherwise I’m confused.
Also, I should clarify: when I said “it does not seem to be consistently true that views are faster” I should have said “it does not seem consistently true that blindly replacing every getitem with view necessarily results in faster code”. I’m sure if I investigated this carefully I’d discover why.
Yes, I had already considered this. There is one other thing I’m going to try first (which basically amounts to changing the structure of a bunch of my functions in a slightly undesirable way) but it’s nice to know I always have that option. I want to be a little careful in case I decide to do something weird or crazy later on.
As for the pointer option, I’ve been using an UnsafeVectorView struct (originally from ReverseDiffSparse) as a workaround. It implements the AbstractArray interface, but has only isbits fields (including the pointer), making it an isbits type itself.
That’s more or less what I thought SubArrays were in the first place (with bounds checking). Since it seems that’s not the case, this’ll be useful, thanks.
Conceptually, the reason that view is boxed is as follows. Suppose v is a view of A. Then it is necessary for the data of A to remain alive as long as v is alive. So the run-time system needs to keep track of all live copies of v so that it knows eventually when it can deallocate A. If v were treated as a plain-bits type, then the run-time system would not be able to keep track of copies. The boxing can be eliminated if the compiler can prove to itself that all copies of the view go out of scope before A does. This is the optimization that the core developers are pursuing.
I would recommend the following performant way to apply functions to subarrays. If the subarray is relatively large, then use view because the overhead of the heap allocation and deallocation of the view is minor. If the subarray is small, then instead preallocate space for a copy c of the subarray, and then copy the subarray into c (using a dot operation, not colon subscripting), call your function on c, and then copy back afterwards with another dot operation.
So I have been following this discussion very keenly as I have millions of views and want them to be non-allocating.
I ran it in the following problem which I am not sure how to solve.
Why does the first one has no memory allocation while the second one allocates memory even though I am only multiplying by a scalar ?
One of us should really get around to writing a @nonallocating macro. It should be simple for a particular use case, but it might be complicated to write one which covers a wide variety of use cases. It would be nice to have something like that in Base.
Basically, it uses @allocated to check allocations but does some magic to put everything inside a function so that it gives the right answers even at global scope.