Reshape a 1-D array into an array of different-size arrays

I have two vectors x and n of equal length, and I want to create an array of arrays y from x, where the lengths for the subarrays of y corresponds to the elements of n.

For example, x=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8] and n=[3,1,4]. Then y=[[0.1,0.2,0.3],[0.4],[0.5,0.6,0.7,0.8]].

Is there a nice/fast way to do this? I tried searching the forum, but I couldn’t find anything directly relevant.

In Python (with NumPy) this would work:

indexPoints=np.cumsum(n[0:-1]);
y=np.split(x,indexPoints,axis=0);

My current stopgap in Julia:

nCumSum=cumsum(n);
numbTotal=nCumSum[end];
indexFirst=copy(nCumSum);
indexFirst[2:end]=indexFirst[1:end-1].+1;indexFirst[1]=1;
indexSecond=copy(nCumSum);
y=[(x[indexFirst[ii]:indexSecond[ii]]) for ii in 1:length(n)];

Update: I had a couple of typos in the Julia code and a mistake in the Python code.

This seems applicable. Just adjust for the non-constant n.

It seems like you’re trying to do this in a very vectorized style, but that’s neither necessary nor optimal in Julia. Loops in Julia are fast, and doing this operation in a loop is pretty straightforward:

julia> x=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8];

julia> n=[3,1,4];

julia> function split(x, n)
         result = Vector{Vector{eltype(x)}}()
         start = firstindex(x)
         for len in n
           push!(result, x[start:(start + len - 1)])
           start += len
         end
         result
       end
split (generic function with 1 method)

julia> split(x, n)
3-element Array{Array{Float64,1},1}:
 [0.1, 0.2, 0.3]     
 [0.4]               
 [0.5, 0.6, 0.7, 0.8]
julia> using BenchmarkTools

julia> @btime split($x, $n)
  133.058 ns (5 allocations: 448 bytes)

I tried comparing this with the code in the original post, but I get an index error when running your code:

julia> function split2(x, n)
         nCumSum=cumsum(n)
         numbTotal=nCumSum[end]
         indexFirst=copy(nCumSum)
         indexFirst[2:end]=indexFirst[1:end-1].+1;indexFirst[1]=1
         indexSecond=copy(nCumSum)
         y=[(x[indexFirst[ii]:indexSecond[ii]]) for ii in 1:numbTotal]
       end
split2 (generic function with 1 method)

julia> split2(x, n)
ERROR: BoundsError: attempt to access 3-element Array{Int64,1} at index [4]

I also tried comparing with Numpy, but np.split(x, n, axis=0) does not actually produce the result you’re asking for:

In [16]: x=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8]

In [17]: n=[3,1,4]

In [18]: np.split(x, n, axis=0)
Out[18]: 
[array([0.1, 0.2, 0.3]),
 array([], dtype=float64),
 array([0.2, 0.3, 0.4]),
 array([0.5, 0.6, 0.7, 0.8])]
3 Likes

OK, thanks.

Yes, I had a couple of typos in the Julia code. Sorry about that. And the Python code should have been:

indexPoints=np.cumsum(n[0:-1]);
y=np.split(x,indexPoints,axis=0);

And yes, I was looking for a (succinct) vectorized way.

Gotcha. For what it’s worth, the Julia version (with a loop) is ~46 times faster than the vectorized numpy version:

julia> using BenchmarkTools

julia> @btime split($x, $n)
  133.058 ns (5 allocations: 448 bytes)
In [22]: def mysplit(x, n):
    ...:     indexPoints = np.cumsum(n[0:-1])
    ...:     return np.split(x, indexPoints, axis=0)
    ...: 
    ...: 

In [23]: mysplit(x, n)
Out[23]: [array([0.1, 0.2, 0.3]), array([0.4]), array([0.5, 0.6, 0.7, 0.8])]

In [24]: %timeit mysplit(x, n)
6.24 µs ± 59.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4 Likes

If you want a pithy two-liner akin to your numpy code, you could just use a comprehension:

indexPoints=np.cumsum(n[0:-1]);
y=np.split(x,indexPoints,axis=0);
index_points = cumsum(n)
[x[i+1:j] for (i,j) in zip([0;index_points[1:end-1]], index_points)]

Robin’s for loop, though, will certainly still be fastest. That’s one of the big benefits of Julia — you don’t need to bend over backwards to write your algorithm in a vectorized style, and you’re not out of luck if Julia doesn’t have the exact vectorized method you’re looking for.

2 Likes

I see. No need for vectorization always.

Well, I guess I’ll use a for loop, as I need to do the same thing to a couple of arrays.

Thanks the advice.