I come from R world here for performance. @time shows that (123.55 k allocations: 6.260 MiB) for the below code. It seems more allocations than my expectation. So I was wondering if there be some improvement with the code?
m = reshape(1:12, 3, :);
map([m[:, i] for i = 1:size(m, 2)]) do x
vcat(repeat(x, 1, 2), zeros(Int, 1, 2))
end
If you are worried about allocations, you can reduce them further by
julia> function f()
m = reshape(1:12, 3, :);
map(@view(m[:, i]) for i = 1:size(m, 2)) do x
vcat(repeat(x, 1, 2), zeros(Int, 1, 2))
end
end
f (generic function with 1 method)
julia> using BenchmarkTools
julia> @btime f()
800.585 ns (17 allocations: 1.67 KiB)
The changes are
Generator instead of comprehension as argument to map, this avoids the allocation of the vector.
Use of @view to avoid allocating the slice m[:, i]
For optimal peformance, consider a devectorized solution:
function f()
m = reshape(1:12, 3, :)
p,q = size(m)
A = Vector{Matrix{Int64}}(undef, q)
@inbounds for i = 1:q
A[i] = Matrix{Int64}(undef, p+1, 2)
for k = 1:p
A[i][k,1] = A[i][k,2] = m[k,i]
end
A[i][p+1,1] = A[i][p+1,2] = 0
end
A
end
julia> @btime f()
172.140 ns (5 allocations: 688 bytes)
This doesn’t do what you think. You’re only benchmarking v = Vector{Int}(undef, 30), and then running the remaining lines after the benchmark is finished. The semicolon ends the expression, and so only the part before the first semicolon is picked up by @btime.
Try running it in a new Julia session to see that it actually errors when trying to reference v after the benchmark completes:
julia> @btime v = Vector{Int}(undef, 30); for i=1:5:30; v[i:(i+4)] = 1:5; end; reshape(v, 10, 3);
27.772 ns (1 allocation: 336 bytes)
ERROR: UndefVarError: v not defined
Stacktrace:
[1] top-level scope at ./REPL[23]:1 [inlined]
[2] top-level scope at ./none:0
My advice: Don’t worry so much about performance at the initial stage of your project. Instead try to make your code as readable and flexible as possible. That typically means using the built-in functions over devectorized code. During development, add plenty of unit tests for your code. Once your project is nearing completion, profile the code, with production-sized input data, to see where the bottlenecks are. Then optimize those parts only. Your unit tests will make sure that you don’t mess anything up while refactoring your code.
In my experience, with enough work, you can usually speed up code by devectorizing it, but in practice there is rarely a need to do so. Either the speedup is negligible, the part you’re optimizing is not a bottleneck, or the built-in function does such a good job that it’s not worth the effort and added complexity in your code to try to beat it.