Is it possible to cut a vector into 2 pieces without allocations?

Let’s say I have toy data

``````m, n = 10^3, 10^6;
x1 = [rand(-1:0.001:1,s) for j=1:n for s in rand(1:m,1)]
len = [rand(1:length(x)) for  x in x1]
``````

I wish to split each vector `x1[j]` into `x1[j][1:len[j]]` and `x1[j][len[j]:end]`. My solution:

``````function _split!(x1 ::Vector{Vector{Float64}}, len ::Vector{Int}, n ::Int) ::Vector{Vector{Float64}}
x2 = Vector{Vector{Float64}}(undef, n);
for j=1:n
x2[j] = view(x1[j], len[j]+1 : length(x1[j]))
resize!(x1[j], len[j]) end
return x2 end;
``````

However, when running this

``````julia> @time x2 = _split!(x1, len, n);
1.597461 seconds (1.00 M allocations: 1.978 GiB, 37.08% gc time)
``````

I see that many allocations were made, even though no new Floats were created.

Is there a way to improve my function, to cause no memory consumption and be efficient?

You could use two views for each piece of the vector I think

3 Likes

Basically we need to create views, but they will allocate a little.

The other issue is that you are assigning them into a `Vector{Vector}` which convert the `SubArray` into an allocating `Array`.

``````julia> A = [3,4,5,6,7]
5-element Vector{Int64}:
3
4
5
6
7

julia> @views A1, A2 = A[1:2], A[3:end]
([3, 4], [5, 6, 7])

julia> typeof(A1)
SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}
``````

The concrete `SubArray` is a bit lengthy to type. I suggest using `map` rather the `for` loop.

Why should views have to allocate?

1 Like

I think Mark is thinking of julia versions pre-1.5 where any time you wrapped a mutable struct with an immutable struct, you incurred a small allocation.

Here’s the release note from when it was fixed: https://github.com/JuliaLang/julia/blob/v1.5.0/NEWS.md#compilerruntime-improvements

2 Likes

I understood that ultimately the two views of each element vector of the parent vector must be allocated in a (new?) vector.
Or is it possible to make a view of the views?

PS
Instead, I wonder if the use that can be made of the two parts into which one wants to divide the vector cannot be replaced, in all necessary cases, by the use of simple index formulas

`@view x1[j][1:len[j]]`

or

`@view x1[j][len[j]+1 : lastindex(x1[j])`

or both?
In this way there is no new allocation but only the reading of the necessary data where and when it is needed.

The way I’d write this function is like this:

``````function split(x ::Vector{Vector{Float64}}, len ::Vector{Int})
ns = eachindex(x, len)
xl = map(ns) do j
view(x[j], 1:len[j])
end
xr = map(ns) do j
view(x[j], len[j]+1:length(x[j]))
end
xl, xr
end;
``````

this returns a `Tuple` of the left and right views of the arrays and doesn’t modify the input at all.

This runs in about 29 miliseconds on my machine.

Notice that since `x2` is of type `Vector{Vector{Float64}}`, when you do `x2[j] = view(...)`, then there is an implicit `convert(Vector{Float64}, view(...))` in order to store the view in the vector. This is what is causing you to incur so many allocations.

1 Like

What do you think of a comprehension of views?

``````split2(x,len) = @views [(y[1:l], y[l+1:end]) for (l,y) in zip(len,x)]
``````
1 Like

Yeah that’s fine, but it can be more annoying to access the views. i.e. if you want all the left views, you need ot allocate a new vector which is `first.(split2(x, len))`

1 Like

I too had thought of something like this (in the form of a generator) which could be useful in some cases.

``````((@views x1[j][1:len[j]], x1[j][len[j]+1:lastindex(x1[j])]) for j in eachindex(len))
``````
1 Like

in the form perhaps closest to the OP’s request (without allocations )

``````julia> @btime begin
vleft=((@view x1[j][1:len[j]]) for j in eachindex(\$len))
vrigth=((@view x1[j][len[j]+1:lastindex(x1[j])]) for j in eachindex(\$len))
end
1.500 ns (0 allocations: 0 bytes)
Base.Generator{Base.OneTo{Int64}, var"#24#26"}(var"#24#26"(), Base.OneTo(100000))
``````

PS
It’s a pity there is no getindex(vleft, l:r)

2 Likes