I would like to take a vector, x, and create a new vector, y, by repeating each element in x a number of times, where each number is an element of another vector v.
For example: x=[0.1; 0.6; 0.5]; v=[3; 1; 2]; then the resulting vector would be y=[0.1;0.1;0.1;0.6;0.5;0.5].
I couldn’t find a nice way to do this.
In MATLAB, R or NumPy, you would do the following
y=repelem(x,v);
y=rep(x,v);
import numpy as np; #NumPy package for arrays, random number generation, etc
y=np.repeat(x,v);
My workaround is not very elegant and uses a for-loop:
x=[0.1; 0.6; 0.5];
v=[3; 1; 2];
indexLast=(cumsum(v));
indexFirst=ones(size(indexLast));
indexFirst[2:end]=indexFirst[2:end]+indexLast[1:end-1];
#need to convert floats to integers for indices
indexFirst=floor.(Int,indexFirst);indexLast=floor.(Int,indexLast);
y=zeros(indexLast[end],1); #initiate arrays
for ii=1:length(x)
#Note need .= for assignment
y[indexFirst[ii]:indexLast[ii]].=x[ii];
end
z = eltype(x)[]
sizehint!(z, sum(v))
for (x,n) in zip(x,v), i = 1:n
push!(z, x)
end
This is slightly more verbose than the fill solution, but doesn’t allocate any temporary arrays and avoids splatting lots of arguments, though you can avoid the latter with reduce(vcat, fill.(x, v)).
I could have sworn there was a function like this was in StatsBase, but I can never remember its name. Its inverse, counts, lives there… did it move? Or is my memory foggy?
I was surprised you couldn’t do it with the “repeat” function, particularly given it’s fairly easy in three other popular scientific programming languages.
function repeat_push(x, v)
z = eltype(x)[]
sizehint!(z, sum(v))
for (x,n) in zip(x,v), i = 1:n
push!(z, x)
end
return z
end
julia> @btime repeat_push($x, $v);
28.948 μs (2 allocations: 38.83 KiB)
The reason for this is that every call to push! does a ccall, which not only is slow by itself, but also prevents all kinds of optimizations (in this case SIMDing the inner loop). By pre-allocating and indexing, massive speedups are possible:
function repeat_prealloc(x, v)
z = similar(x, sum(v))
i = 0
for (x,n) in zip(x,v), k = 1:n
@inbounds z[i += 1] = x
end
return z
end
julia> @btime repeat_prealloc($x, $v);
2.282 μs (2 allocations: 38.77 KiB)