# Iterating over cols, vcat, repeat?

I come from R world here for performance. @time shows that (123.55 k allocations: 6.260 MiB) for the below code. It seems more allocations than my expectation. So I was wondering if there be some improvement with the code?

``````m = reshape(1:12, 3, :);
map([m[:, i] for i = 1:size(m, 2)]) do x
vcat(repeat(x, 1, 2), zeros(Int, 1, 2))
end
``````

More specifically, two sub-questions below:

• were iterating over columns [m[:, i] for i = 1:size(m, 2)] expensive? The thread https://github.com/JuliaLang/julia/issues/14491 has not been closed yet.
• were `vcat` or/and `repeat` expensive? if so, any suggestions for improvement? Thanks!

You are benchmarking in global scope, try this

``````julia> m = reshape(1:12, 3, :);

julia> @time map([m[:, i] for i = 1:size(m, 2)]) do x
vcat(repeat(x, 1, 2), zeros(Int, 1, 2))
end
0.088479 seconds (123.55 k allocations: 6.259 MiB)
4-element Array{Array{Int64,2},1}:
[1 1; 2 2; 3 3; 0 0]
[4 4; 5 5; 6 6; 0 0]
[7 7; 8 8; 9 9; 0 0]
[10 10; 11 11; 12 12; 0 0]

julia> function f()
m = reshape(1:12, 3, :);
map([m[:, i] for i = 1:size(m, 2)]) do x
vcat(repeat(x, 1, 2), zeros(Int, 1, 2))
end
end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime f()
929.750 ns (23 allocations: 2.23 KiB)
``````

The manual tells you all you need to know about benchmarking and performance.

If you are worried about allocations, you can reduce them further by

``````julia> function f()
m = reshape(1:12, 3, :);
map(@view(m[:, i]) for i = 1:size(m, 2)) do x
vcat(repeat(x, 1, 2), zeros(Int, 1, 2))
end
end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime f()
800.585 ns (17 allocations: 1.67 KiB)
``````

The changes are

• Generator instead of comprehension as argument to `map`, this avoids the allocation of the vector.
• Use of `@view` to avoid allocating the slice `m[:, i]`
4 Likes

For optimal peformance, consider a devectorized solution:

``````function f()
m = reshape(1:12, 3, :)
p,q = size(m)
A = Vector{Matrix{Int64}}(undef, q)
@inbounds for i = 1:q
A[i] = Matrix{Int64}(undef, p+1, 2)
for k = 1:p
A[i][k,1] = A[i][k,2] = m[k,i]
end
A[i][p+1,1] = A[i][p+1,2] = 0
end
A
end

julia> @btime f()
172.140 ns (5 allocations: 688 bytes)
``````
5 Likes

Many thanks for all the help. Testing the built-in function `repeat` along the â€śdevectorizedâ€ť way and found that:

``````
julia> @btime repeat(1:5, 2, 3);
121.025 ns (1 allocation: 336 bytes)

julia> @btime v = Vector{Int}(undef, 30); for i=1:5:30; v[i:(i+4)] = 1:5; end; reshape(v, 10, 3);
34.007 ns (1 allocation: 336 bytes)
``````

Well, could I conclude that better avoiding built-in vectorized array functions for optimal performance in Julia? Thanks!

This doesnâ€™t do what you think. Youâ€™re only benchmarking `v = Vector{Int}(undef, 30)`, and then running the remaining lines after the benchmark is finished. The semicolon ends the expression, and so only the part before the first semicolon is picked up by `@btime`.

Try running it in a new Julia session to see that it actually errors when trying to reference `v` after the benchmark completes:

``````julia> @btime v = Vector{Int}(undef, 30); for i=1:5:30; v[i:(i+4)] = 1:5; end; reshape(v, 10, 3);
27.772 ns (1 allocation: 336 bytes)
ERROR: UndefVarError: v not defined
Stacktrace:
[1] top-level scope at ./REPL[23]:1 [inlined]
[2] top-level scope at ./none:0
``````
4 Likes

And note that you donâ€™t necessarily have to squeeze everything onto one line; you can use `begin/end` instead to create a compound expression.

``````@btime begin
v = Vector{Int}(undef, 30)
for i=1:5:30
v[i:(i+4)] = 1:5
end
reshape(v, 10, 3)
end
``````
2 Likes

In my experience, with enough work, you can usually speed up code by devectorizing it, but in practice there is rarely a need to do so. Either the speedup is negligible, the part youâ€™re optimizing is not a bottleneck, or the built-in function does such a good job that itâ€™s not worth the effort and added complexity in your code to try to beat it.

3 Likes