Repeating (with function repeat) array elements using a vector of numbers

I would like to take a vector, x, and create a new vector, y, by repeating each element in x a number of times, where each number is an element of another vector v.

For example: x=[0.1; 0.6; 0.5]; v=[3; 1; 2]; then the resulting vector would be y=[0.1;0.1;0.1;0.6;0.5;0.5].

I couldn’t find a nice way to do this.

In MATLAB, R or NumPy, you would do the following
y=repelem(x,v);

y=rep(x,v);

import numpy as np; #NumPy package for arrays, random number generation, etc
y=np.repeat(x,v);

My workaround is not very elegant and uses a for-loop:

x=[0.1; 0.6; 0.5];
v=[3; 1; 2];

indexLast=(cumsum(v));
indexFirst=ones(size(indexLast));
indexFirst[2:end]=indexFirst[2:end]+indexLast[1:end-1];
#need to convert floats to integers for indices
indexFirst=floor.(Int,indexFirst);indexLast=floor.(Int,indexLast);
y=zeros(indexLast[end],1); #initiate arrays
for ii=1:length(x)
    #Note need .= for assignment
    y[indexFirst[ii]:indexLast[ii]].=x[ii];
end
2 Likes

One option:

julia> x=[0.1; 0.6; 0.5]; v=[3; 1; 2];

julia> vcat(fill.(x, v)...)
6-element Array{Float64,1}:
 0.1
 0.1
 0.1
 0.6
 0.5
 0.5
4 Likes

I didn’t know the fill function. That works. Thanks a lot.

You could also just use repeated push! calls:

z = eltype(x)[]
sizehint!(z, sum(v))
for (x,n) in zip(x,v), i = 1:n 
    push!(z, x)
end

This is slightly more verbose than the fill solution, but doesn’t allocate any temporary arrays and avoids splatting lots of arguments, though you can avoid the latter with reduce(vcat, fill.(x, v)).

1 Like

I could have sworn there was a function like this was in StatsBase, but I can never remember its name. Its inverse, counts, lives there… did it move? Or is my memory foggy?

I was surprised you couldn’t do it with the “repeat” function, particularly given it’s fairly easy in three other popular scientific programming languages.

I think it’s Base.repeat which is defined for AbstractDataFrames in DataFrames.jl

I was excited to see this PR, but I don’t see much of a performance gain… am I missing something? Has splatting been optimized too?

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.($x, $v)...);
  10.422 μs (103 allocations: 87.75 KiB)

julia> @btime reduce(vcat, fill.($x, $v));
  10.075 μs (103 allocations: 87.75 KiB)

Also, the repeated push! is a lot slower:

function repeat_push(x, v)
    z = eltype(x)[]
    sizehint!(z, sum(v))
    for (x,n) in zip(x,v), i = 1:n
        push!(z, x)
    end
    return z
end

julia> @btime repeat_push($x, $v);
  28.948 μs (2 allocations: 38.83 KiB)

The reason for this is that every call to push! does a ccall, which not only is slow by itself, but also prevents all kinds of optimizations (in this case SIMDing the inner loop). By pre-allocating and indexing, massive speedups are possible:

function repeat_prealloc(x, v)
    z = similar(x, sum(v))
    i = 0
    for (x,n) in zip(x,v), k = 1:n
        @inbounds z[i += 1] = x
    end
    return z
end

julia> @btime repeat_prealloc($x, $v);
  2.282 μs (2 allocations: 38.77 KiB)
2 Likes

I believe it is StatsBase.inverse_rle

EDIT:

It basically corresponds to the fast repeat_prealloc above.

julia> @btime repeat_prealloc($x, $v);
  3.686 μs (2 allocations: 39.33 KiB)

julia> @btime inverse_rle($x, $v);
  3.625 μs (2 allocations: 39.33 KiB)
3 Likes

Also see this PR, which is almost finished:
https://github.com/JuliaLang/julia/pull/29560

2 Likes

You can also avoid allocation of temporaries using FillArrays.jl:

julia> using FillArrays, BenchmarkTools

julia> vf(x,v) = vcat(fill.(x, v)...)
vf (generic function with 1 method)

julia> vF(x,v) = vcat(Fill.(x, v)...)
vF (generic function with 1 method)

julia> x=[0.1; 0.6; 0.5]; v=[3; 1; 2];

julia> @btime vf(x,v)
  198.697 ns (5 allocations: 544 bytes)
6-element Array{Float64,1}:
 0.1
 0.1
 0.1
 0.6
 0.5
 0.5

julia> @btime vF(x,v)
  166.403 ns (5 allocations: 352 bytes)
6-element Array{Float64,1}:
 0.1
 0.1
 0.1
 0.6
 0.5
 0.5
1 Like

I’m seeing slightly worse performance however for the larger example above:

julia> using FillArrays, BenchmarkTools

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.($x, $v)...);
  10.765 μs (103 allocations: 96.70 KiB)

julia> @btime vcat(Fill.($x, $v)...);
  13.433 μs (206 allocations: 54.23 KiB)

That’s probably the penalty from splatting so many values into separate function arguments.

With reduce and FillArrays the performance is quite good

julia> using FillArrays, BenchmarkTools

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.($x, $v)...);
  9.296 μs (103 allocations: 80.70 KiB)

julia> @btime vcat(Fill.($x, $v)...);
  9.349 μs (206 allocations: 46.42 KiB)

julia> @btime reduce(vcat,Fill.($x, $v));
  3.760 μs (3 allocations: 36.97 KiB)
2 Likes