Is it possible to cut a vector into 2 pieces without allocations?

Let’s say I have toy data

m, n = 10^3, 10^6;
x1 = [rand(-1:0.001:1,s) for j=1:n for s in rand(1:m,1)] 
len = [rand(1:length(x)) for  x in x1] 

I wish to split each vector x1[j] into x1[j][1:len[j]] and x1[j][len[j]:end]. My solution:

function _split!(x1 ::Vector{Vector{Float64}}, len ::Vector{Int}, n ::Int) ::Vector{Vector{Float64}}
    x2 = Vector{Vector{Float64}}(undef, n);
    for j=1:n  
        x2[j] = view(x1[j], len[j]+1 : length(x1[j]))
        resize!(x1[j], len[j]) end
    return x2 end;

However, when running this

julia> @time x2 = _split!(x1, len, n);
  1.597461 seconds (1.00 M allocations: 1.978 GiB, 37.08% gc time)

I see that many allocations were made, even though no new Floats were created.

Is there a way to improve my function, to cause no memory consumption and be efficient?

You could use two views for each piece of the vector I think

3 Likes

Basically we need to create views, but they will allocate a little.

The other issue is that you are assigning them into a Vector{Vector} which convert the SubArray into an allocating Array.

julia> A = [3,4,5,6,7]
5-element Vector{Int64}:
 3
 4
 5
 6
 7

julia> @views A1, A2 = A[1:2], A[3:end]
([3, 4], [5, 6, 7])

julia> typeof(A1)
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}

The concrete SubArray is a bit lengthy to type. I suggest using map rather the for loop.

Why should views have to allocate?

1 Like

I think Mark is thinking of julia versions pre-1.5 where any time you wrapped a mutable struct with an immutable struct, you incurred a small allocation.

Here’s the release note from when it was fixed: https://github.com/JuliaLang/julia/blob/v1.5.0/NEWS.md#compilerruntime-improvements

2 Likes

I understood that ultimately the two views of each element vector of the parent vector must be allocated in a (new?) vector.
Or is it possible to make a view of the views?

PS
Instead, I wonder if the use that can be made of the two parts into which one wants to divide the vector cannot be replaced, in all necessary cases, by the use of simple index formulas

@view x1[j][1:len[j]]

or

@view x1[j][len[j]+1 : lastindex(x1[j])

or both?
In this way there is no new allocation but only the reading of the necessary data where and when it is needed.

The way I’d write this function is like this:

function split(x ::Vector{Vector{Float64}}, len ::Vector{Int})
    ns = eachindex(x, len)
    xl = map(ns) do j  
        view(x[j], 1:len[j])
    end
    xr = map(ns) do j
        view(x[j], len[j]+1:length(x[j]))
    end
    xl, xr
end;

this returns a Tuple of the left and right views of the arrays and doesn’t modify the input at all.

This runs in about 29 miliseconds on my machine.

Notice that since x2 is of type Vector{Vector{Float64}}, when you do x2[j] = view(...), then there is an implicit convert(Vector{Float64}, view(...)) in order to store the view in the vector. This is what is causing you to incur so many allocations.

1 Like

What do you think of a comprehension of views?

split2(x,len) = @views [(y[1:l], y[l+1:end]) for (l,y) in zip(len,x)]
1 Like

Yeah that’s fine, but it can be more annoying to access the views. i.e. if you want all the left views, you need ot allocate a new vector which is first.(split2(x, len))

1 Like

I too had thought of something like this (in the form of a generator) which could be useful in some cases.

((@views x1[j][1:len[j]], x1[j][len[j]+1:lastindex(x1[j])]) for j in eachindex(len))
1 Like

in the form perhaps closest to the OP’s request (without allocations :grin:)

julia> @btime begin
       vleft=((@view x1[j][1:len[j]]) for j in eachindex($len))
       vrigth=((@view x1[j][len[j]+1:lastindex(x1[j])]) for j in eachindex($len))
       end
  1.500 ns (0 allocations: 0 bytes)
Base.Generator{Base.OneTo{Int64}, var"#24#26"}(var"#24#26"(), Base.OneTo(100000))

PS
It’s a pity there is no getindex(vleft, l:r)

2 Likes