A = rand(1000,20);
res = zeros(size(A,1));
@time for i in 1:size(A,1)
p = view(A, i, :)
res[i] = sum(p)
end
0.004405 seconds (21.94 k allocations: 436.531 KiB)
Note the number of allocations and memory usage From what I understand the view costs some memory to be allocated and is being allocated in every iteration. But if view is an immutable struct then why is it not being optimized / reused in every iteration?
PS: I am aware of taking sum along a dimension, using sum here only as an example.
function h()
A = rand(1000,20);
res = zeros(size(A,1));
for i in 1:size(A,1)
p = view(A, i, :)
res[i] = sum(p)
end
res
end
@time h();
0.000078 seconds (1.01 k allocations: 211.297 KiB)
@time sum(A,2);
0.000052 seconds (10 allocations: 8.188 KiB)
The results above are after warmup of code. Its faster but 1k allocations remain.
One issue’s that your function h() includes the time and allocations needed to generate the random array. Rewriting to accept A as an argument cuts the time in half. Views are still expensive, though: you’re allocating at each iteration of your inner loop for an otherwise-cheap summation.
Sorry for the incorrect example shared above. I was actually trying to understand why are view objects being allocated in each loop. They are defined as immutable structs with fixed size. So shouldn’t the compiler be able to reuse the same space allocated in the previous iteration with new parameters?
using BenchmarkTools
function m()
A = rand(1000000,20);
res = zeros(size(A,1));
for i in 1:size(A,1)
for j in 1:size(A,2)
res[i] += A[i,j]
end
end
res
end
@btime m();
78.411 ms (4 allocations: 160.22 MiB)
function h()
A = rand(1000000,20);
res = zeros(size(A,1));
for i in 1:size(A,1)
p = view(A, i, :)
res[i] = sum(p)
end
res
end
@btime h();
122.569 ms (1000004 allocations: 205.99 MiB)
function p()
A = rand(1000000,20)
res = sum(A,2)
end
@btime p();
66.007 ms (9 allocations: 160.22 MiB)
rand(1000000, 20) dominates the runtime of all your functions, so I’d highly recommend passing A as an function argument if you’re trying to isolate the performance of another part of the code.
On allocations, see here. Per Stefan, “Being able to stack-allocate objects that refer to the heap is an important case that we need to address, but doing so is non-trivial and hasn’t been done yet.”