The following code contains two equivalent implementations of a trivial algorithm. slow_func
uses vectorized indexing and assignment while fast_func
individually indexes and assigns each array element in a loop.
Can anyone explain why slow_func
produces 300 allocations, while fast_func
only produces 4? This feels like a bug to me.
If it is not a bug, then am I right to say we should never use slice indexing when performance is important?
function fast_func(a, b)
for i = 1:size(a, 1)
for j = 1:size(a, 2)
a[i, j] = b[i, j]
end
end
end
function slow_func(a, b)
for i = 1:size(a, 1)
a[i, :] = b[i, :]
end
end
a = zeros(300, 1)
b = zeros(300, 1)
@time slow_func(a, b)
@time slow_func(a, b)
@time fast_func(a, b)
@time fast_func(a, b)