Is it possible to cut a vector into 2 pieces without allocations?

Leo_I · November 26, 2023, 4:05pm

Let’s say I have toy data

m, n = 10^3, 10^6;
x1 = [rand(-1:0.001:1,s) for j=1:n for s in rand(1:m,1)] 
len = [rand(1:length(x)) for  x in x1]

I wish to split each vector x1[j] into x1[j][1:len[j]] and x1[j][len[j]:end]. My solution:

function _split!(x1 ::Vector{Vector{Float64}}, len ::Vector{Int}, n ::Int) ::Vector{Vector{Float64}}
    x2 = Vector{Vector{Float64}}(undef, n);
    for j=1:n  
        x2[j] = view(x1[j], len[j]+1 : length(x1[j]))
        resize!(x1[j], len[j]) end
    return x2 end;

However, when running this

julia> @time x2 = _split!(x1, len, n);
  1.597461 seconds (1.00 M allocations: 1.978 GiB, 37.08% gc time)

I see that many allocations were made, even though no new Floats were created.

Is there a way to improve my function, to cause no memory consumption and be efficient?

Tortar · November 26, 2023, 4:12pm

You could use two views for each piece of the vector I think

mkitti · November 26, 2023, 9:43pm

Basically we need to create views, but they will allocate a little.

The other issue is that you are assigning them into a Vector{Vector} which convert the SubArray into an allocating Array.

julia> A = [3,4,5,6,7]
5-element Vector{Int64}:
 3
 4
 5
 6
 7

julia> @views A1, A2 = A[1:2], A[3:end]
([3, 4], [5, 6, 7])

julia> typeof(A1)
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}

The concrete SubArray is a bit lengthy to type. I suggest using map rather the for loop.

jar1 · November 27, 2023, 5:07am

Why should views have to allocate?

Mason · November 27, 2023, 5:42am

I think Mark is thinking of julia versions pre-1.5 where any time you wrapped a mutable struct with an immutable struct, you incurred a small allocation.

Here’s the release note from when it was fixed: https://github.com/JuliaLang/julia/blob/v1.5.0/NEWS.md#compilerruntime-improvements

rocco_sprmnt21 · November 27, 2023, 8:09am

I understood that ultimately the two views of each element vector of the parent vector must be allocated in a (new?) vector.
Or is it possible to make a view of the views?

PS
Instead, I wonder if the use that can be made of the two parts into which one wants to divide the vector cannot be replaced, in all necessary cases, by the use of simple index formulas

@view x1[j][1:len[j]]

or

@view x1[j][len[j]+1 : lastindex(x1[j])

or both?
In this way there is no new allocation but only the reading of the necessary data where and when it is needed.

Mason · November 27, 2023, 8:24am

The way I’d write this function is like this:

function split(x ::Vector{Vector{Float64}}, len ::Vector{Int})
    ns = eachindex(x, len)
    xl = map(ns) do j  
        view(x[j], 1:len[j])
    end
    xr = map(ns) do j
        view(x[j], len[j]+1:length(x[j]))
    end
    xl, xr
end;

this returns a Tuple of the left and right views of the arrays and doesn’t modify the input at all.

This runs in about 29 miliseconds on my machine.

Mason · November 27, 2023, 8:34am

Notice that since x2 is of type Vector{Vector{Float64}}, when you do x2[j] = view(...), then there is an implicit convert(Vector{Float64}, view(...)) in order to store the view in the vector. This is what is causing you to incur so many allocations.

rafael.guerra · November 27, 2023, 8:54am

What do you think of a comprehension of views?

split2(x,len) = @views [(y[1:l], y[l+1:end]) for (l,y) in zip(len,x)]

Mason · November 27, 2023, 9:25am

Yeah that’s fine, but it can be more annoying to access the views. i.e. if you want all the left views, you need ot allocate a new vector which is first.(split2(x, len))

rocco_sprmnt21 · November 27, 2023, 10:03am

I too had thought of something like this (in the form of a generator) which could be useful in some cases.

((@views x1[j][1:len[j]], x1[j][len[j]+1:lastindex(x1[j])]) for j in eachindex(len))

rocco_sprmnt21 · November 27, 2023, 10:25am

in the form perhaps closest to the OP’s request (without allocations )

julia> @btime begin
       vleft=((@view x1[j][1:len[j]]) for j in eachindex($len))
       vrigth=((@view x1[j][len[j]+1:lastindex(x1[j])]) for j in eachindex($len))
       end
  1.500 ns (0 allocations: 0 bytes)
Base.Generator{Base.OneTo{Int64}, var"#24#26"}(var"#24#26"(), Base.OneTo(100000))

PS
It’s a pity there is no getindex(vleft, l:r)

Topic		Replies	Views
Problem with creating view of vectors in vector of vectors (it keeps allocating) Performance memory-allocation , vector	2	129	July 9, 2024
Passing views to function without allocation New to Julia	14	1534	December 18, 2020
Why these two functions have different allocations? General Usage	11	429	July 13, 2021
Is there a way to guarantee 0 allocations when accessing an array? General Usage	18	4307	June 1, 2017
Performance hit of allocation regardless of size Performance memory-allocation	7	209	October 21, 2024

Is it possible to cut a vector into 2 pieces without allocations?

Related topics