# Repeating (with function repeat) array elements using a vector of numbers

I would like to take a vector, x, and create a new vector, y, by repeating each element in x a number of times, where each number is an element of another vector v.

For example: x=[0.1; 0.6; 0.5]; v=[3; 1; 2]; then the resulting vector would be y=[0.1;0.1;0.1;0.6;0.5;0.5].

I couldn’t find a nice way to do this.

In MATLAB, R or NumPy, you would do the following
y=repelem(x,v);

y=rep(x,v);

import numpy as np; #NumPy package for arrays, random number generation, etc
y=np.repeat(x,v);

My workaround is not very elegant and uses a for-loop:

``````x=[0.1; 0.6; 0.5];
v=[3; 1; 2];

indexLast=(cumsum(v));
indexFirst=ones(size(indexLast));
indexFirst[2:end]=indexFirst[2:end]+indexLast[1:end-1];
#need to convert floats to integers for indices
indexFirst=floor.(Int,indexFirst);indexLast=floor.(Int,indexLast);
y=zeros(indexLast[end],1); #initiate arrays
for ii=1:length(x)
#Note need .= for assignment
y[indexFirst[ii]:indexLast[ii]].=x[ii];
end
``````
2 Likes

One option:

``````julia> x=[0.1; 0.6; 0.5]; v=[3; 1; 2];

julia> vcat(fill.(x, v)...)
6-element Array{Float64,1}:
0.1
0.1
0.1
0.6
0.5
0.5
``````
4 Likes

I didn’t know the fill function. That works. Thanks a lot.

You could also just use repeated `push!` calls:

``````z = eltype(x)[]
sizehint!(z, sum(v))
for (x,n) in zip(x,v), i = 1:n
push!(z, x)
end
``````

This is slightly more verbose than the `fill` solution, but doesn’t allocate any temporary arrays and avoids splatting lots of arguments, though you can avoid the latter with `reduce(vcat, fill.(x, v))`.

1 Like

I could have sworn there was a function like this was in StatsBase, but I can never remember its name. Its inverse, `counts`, lives there… did it move? Or is my memory foggy?

I was surprised you couldn’t do it with the “repeat” function, particularly given it’s fairly easy in three other popular scientific programming languages.

I think it’s `Base.repeat` which is defined for `AbstractDataFrames` in DataFrames.jl

I was excited to see this PR, but I don’t see much of a performance gain… am I missing something? Has splatting been optimized too?

``````julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.(\$x, \$v)...);
10.422 μs (103 allocations: 87.75 KiB)

julia> @btime reduce(vcat, fill.(\$x, \$v));
10.075 μs (103 allocations: 87.75 KiB)
``````

Also, the repeated `push!` is a lot slower:

``````function repeat_push(x, v)
z = eltype(x)[]
sizehint!(z, sum(v))
for (x,n) in zip(x,v), i = 1:n
push!(z, x)
end
return z
end

julia> @btime repeat_push(\$x, \$v);
28.948 μs (2 allocations: 38.83 KiB)
``````

The reason for this is that every call to `push!` does a ccall, which not only is slow by itself, but also prevents all kinds of optimizations (in this case SIMDing the inner loop). By pre-allocating and indexing, massive speedups are possible:

``````function repeat_prealloc(x, v)
z = similar(x, sum(v))
i = 0
for (x,n) in zip(x,v), k = 1:n
@inbounds z[i += 1] = x
end
return z
end

julia> @btime repeat_prealloc(\$x, \$v);
2.282 μs (2 allocations: 38.77 KiB)
``````
2 Likes

I believe it is `StatsBase.inverse_rle`

EDIT:

It basically corresponds to the fast `repeat_prealloc` above.

``````julia> @btime repeat_prealloc(\$x, \$v);
3.686 μs (2 allocations: 39.33 KiB)

julia> @btime inverse_rle(\$x, \$v);
3.625 μs (2 allocations: 39.33 KiB)
``````
3 Likes

Also see this PR, which is almost finished:

2 Likes

You can also avoid allocation of temporaries using FillArrays.jl:

``````julia> using FillArrays, BenchmarkTools

julia> vf(x,v) = vcat(fill.(x, v)...)
vf (generic function with 1 method)

julia> vF(x,v) = vcat(Fill.(x, v)...)
vF (generic function with 1 method)

julia> x=[0.1; 0.6; 0.5]; v=[3; 1; 2];

julia> @btime vf(x,v)
198.697 ns (5 allocations: 544 bytes)
6-element Array{Float64,1}:
0.1
0.1
0.1
0.6
0.5
0.5

julia> @btime vF(x,v)
166.403 ns (5 allocations: 352 bytes)
6-element Array{Float64,1}:
0.1
0.1
0.1
0.6
0.5
0.5
``````
1 Like

I’m seeing slightly worse performance however for the larger example above:

``````julia> using FillArrays, BenchmarkTools

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.(\$x, \$v)...);
10.765 μs (103 allocations: 96.70 KiB)

julia> @btime vcat(Fill.(\$x, \$v)...);
13.433 μs (206 allocations: 54.23 KiB)
``````

That’s probably the penalty from splatting so many values into separate function arguments.

With `reduce` and `FillArrays` the performance is quite good

``````julia> using FillArrays, BenchmarkTools

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.(\$x, \$v)...);
9.296 μs (103 allocations: 80.70 KiB)

julia> @btime vcat(Fill.(\$x, \$v)...);
9.349 μs (206 allocations: 46.42 KiB)

julia> @btime reduce(vcat,Fill.(\$x, \$v));
3.760 μs (3 allocations: 36.97 KiB)
``````
2 Likes