Repeating (with function repeat) array elements using a vector of numbers

keeler · March 3, 2019, 9:40pm

I would like to take a vector, x, and create a new vector, y, by repeating each element in x a number of times, where each number is an element of another vector v.

For example: x=[0.1; 0.6; 0.5]; v=[3; 1; 2]; then the resulting vector would be y=[0.1;0.1;0.1;0.6;0.5;0.5].

I couldn’t find a nice way to do this.

In MATLAB, R or NumPy, you would do the following
y=repelem(x,v);

y=rep(x,v);

import numpy as np; #NumPy package for arrays, random number generation, etc
y=np.repeat(x,v);

My workaround is not very elegant and uses a for-loop:

x=[0.1; 0.6; 0.5];
v=[3; 1; 2];

indexLast=(cumsum(v));
indexFirst=ones(size(indexLast));
indexFirst[2:end]=indexFirst[2:end]+indexLast[1:end-1];
#need to convert floats to integers for indices
indexFirst=floor.(Int,indexFirst);indexLast=floor.(Int,indexLast);
y=zeros(indexLast[end],1); #initiate arrays
for ii=1:length(x)
    #Note need .= for assignment
    y[indexFirst[ii]:indexLast[ii]].=x[ii];
end

bennedich · March 3, 2019, 9:43pm

One option:

julia> x=[0.1; 0.6; 0.5]; v=[3; 1; 2];

julia> vcat(fill.(x, v)...)
6-element Array{Float64,1}:
 0.1
 0.1
 0.1
 0.6
 0.5
 0.5

keeler · March 4, 2019, 1:07am

I didn’t know the fill function. That works. Thanks a lot.

stevengj · March 4, 2019, 3:46am

You could also just use repeated push! calls:

z = eltype(x)[]
sizehint!(z, sum(v))
for (x,n) in zip(x,v), i = 1:n 
    push!(z, x)
end

This is slightly more verbose than the fill solution, but doesn’t allocate any temporary arrays and avoids splatting lots of arguments, though you can avoid the latter with reduce(vcat, fill.(x, v)).

mbauman · March 4, 2019, 4:18am

I could have sworn there was a function like this was in StatsBase, but I can never remember its name. Its inverse, counts, lives there… did it move? Or is my memory foggy?

keeler · March 4, 2019, 4:42am

I was surprised you couldn’t do it with the “repeat” function, particularly given it’s fairly easy in three other popular scientific programming languages.

pdeffebach · March 4, 2019, 5:09am

I think it’s Base.repeat which is defined for AbstractDataFrames in DataFrames.jl

bennedich · March 4, 2019, 6:26am

I was excited to see this PR, but I don’t see much of a performance gain… am I missing something? Has splatting been optimized too?

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.($x, $v)...);
  10.422 μs (103 allocations: 87.75 KiB)

julia> @btime reduce(vcat, fill.($x, $v));
  10.075 μs (103 allocations: 87.75 KiB)

Also, the repeated push! is a lot slower:

function repeat_push(x, v)
    z = eltype(x)[]
    sizehint!(z, sum(v))
    for (x,n) in zip(x,v), i = 1:n
        push!(z, x)
    end
    return z
end

julia> @btime repeat_push($x, $v);
  28.948 μs (2 allocations: 38.83 KiB)

The reason for this is that every call to push! does a ccall, which not only is slow by itself, but also prevents all kinds of optimizations (in this case SIMDing the inner loop). By pre-allocating and indexing, massive speedups are possible:

function repeat_prealloc(x, v)
    z = similar(x, sum(v))
    i = 0
    for (x,n) in zip(x,v), k = 1:n
        @inbounds z[i += 1] = x
    end
    return z
end

julia> @btime repeat_prealloc($x, $v);
  2.282 μs (2 allocations: 38.77 KiB)

carstenbauer · March 4, 2019, 9:57am

I believe it is StatsBase.inverse_rle

EDIT:

It basically corresponds to the fast repeat_prealloc above.

julia> @btime repeat_prealloc($x, $v);
  3.686 μs (2 allocations: 39.33 KiB)

julia> @btime inverse_rle($x, $v);
  3.625 μs (2 allocations: 39.33 KiB)

nalimilan · March 4, 2019, 9:26pm

Also see this PR, which is almost finished:
https://github.com/JuliaLang/julia/pull/29560

dlfivefifty · March 5, 2019, 7:08am

You can also avoid allocation of temporaries using FillArrays.jl:

julia> using FillArrays, BenchmarkTools

julia> vf(x,v) = vcat(fill.(x, v)...)
vf (generic function with 1 method)

julia> vF(x,v) = vcat(Fill.(x, v)...)
vF (generic function with 1 method)

julia> x=[0.1; 0.6; 0.5]; v=[3; 1; 2];

julia> @btime vf(x,v)
  198.697 ns (5 allocations: 544 bytes)
6-element Array{Float64,1}:
 0.1
 0.1
 0.1
 0.6
 0.5
 0.5

julia> @btime vF(x,v)
  166.403 ns (5 allocations: 352 bytes)
6-element Array{Float64,1}:
 0.1
 0.1
 0.1
 0.6
 0.5
 0.5

bennedich · March 5, 2019, 7:49am

I’m seeing slightly worse performance however for the larger example above:

julia> using FillArrays, BenchmarkTools

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.($x, $v)...);
  10.765 μs (103 allocations: 96.70 KiB)

julia> @btime vcat(Fill.($x, $v)...);
  13.433 μs (206 allocations: 54.23 KiB)

rdeits · March 5, 2019, 12:58pm

That’s probably the penalty from splatting so many values into separate function arguments.

baggepinnen · March 5, 2019, 2:40pm

With reduce and FillArrays the performance is quite good

julia> using FillArrays, BenchmarkTools

julia> x,v = rand(100), rand(1:100, 100);

julia> @btime vcat(fill.($x, $v)...);
  9.296 μs (103 allocations: 80.70 KiB)

julia> @btime vcat(Fill.($x, $v)...);
  9.349 μs (206 allocations: 46.42 KiB)

julia> @btime reduce(vcat,Fill.($x, $v));
  3.760 μs (3 allocations: 36.97 KiB)

Topic		Replies	Views
Unexpected behaviour when using ˋrepeatˋ New to Julia question	7	165	May 10, 2025
Repeat and reshape vector into a matrix General Usage vector , reshaping , matrix	9	150	March 28, 2025
Equivalent of numpy.repeat New to Julia	2	731	December 20, 2021
How do I create a Matrix of Vectors? New to Julia	11	4449	December 11, 2018
How to add values to vector with repeated indices? General Usage	4	229	June 8, 2022

Repeating (with function repeat) array elements using a vector of numbers

Related topics