 Why making `a[range] = b[range]` seems to make unnecessary allocations?

I was wondering why making a[range] = b[range] seems to make unnecessary allocations and if (why) that is intended. For example:

function f!(a, b, n)
a[1:n, 1:n] = b[1:n, 1:n]
end

function g!(a, b, n)
copyto!(a, CartesianIndices((1:n, 1:n)), b, CartesianIndices((1:n, 1:n)))
end

function h!(a, b, n)
for i = 1:n
for k = 1:n
a[k, i] = b[k, i]
end
end
end
n = 500
a = zeros(n,n)
b = rand(n,n)

@time f!(a,b,n); #  0.002301 seconds (6 allocations: 1.908 MiB)
@time g!(a,b,n); #  0.000327 seconds (4 allocations: 160 bytes)
@time h!(a,b,n); #  0.000492 seconds (4 allocations: 160 bytes)

Doing the for loops (h!) seems to be the optimal way allocation-wise if the idea is to make a copy from transpose(b) (inside the function) or any kind of column or line permutation.

a[range] makes a copy for a mix of historical, and memory locality reasons. the simplest solution is to use

function g!(a, b, n)
a[1:n, 1:n] = @view b[1:n, 1:n]
end

which will remove the copy.

Slicing an array in Julia creates a copy, which is where your allocations are coming from. You can do @view(b[1:n, 1:n]) to create a view instead, which will avoid the unnecessary copy. See Could you explain what are views? for some more explanation.

That does not work when copying an array into itself.

Setup:

n = 1000;
m = fld(n,2);
a = zeros(n,n);
a[1:m, 1:m] .= randn.();
v = view(a, m:n, m:n);
function f!(a, b, n)
@views a[1:n, 1:n] = b[1:n, 1:n]
end
@btime f!(v, a, m); # 1.490 ms (7 allocations: 1.91 MiB)
b = rand(n,n);
@btime f!(v, b, m); # 931.648 μs (1 allocation: 64 bytes)

Using copyto! leads to the same

function f!(a, b, n)
copyto!(a, CartesianIndices((1:n, 1:n)), b, CartesianIndices((1:n, 1:n)))
end
@btime f!(v, a, m); # 997.360 μs (2 allocations: 7.63 MiB)
@btime f!(v, b, m); # 238.248 μs (0 allocations: 0 bytes)

Best approach is to use explicit for loops

function f!(a, b, n)
for k = 1:n
for i = 1:n
a[i,k] = b[i,k]
end
end
end
@btime f!(v, a, m); # 312.512 μs (0 allocations: 0 bytes)
@btime f!(v, b, m); # 312.134 μs (0 allocations: 0 bytes)

Note that if you benchmark the functions properly, the looping function is slower for the matrix sizes you’re looking at (For very small matrices, the looping version will still win)

function g!(a, b, n)
copyto!(a, CartesianIndices((1:n, 1:n)), b, CartesianIndices((1:n, 1:n)))
end

function h!(a, b, n)
for i = 1:n
for k = 1:n
a[k, i] = b[k, i]
end
end
end
n = 500;
a = zeros(n,n);
b = rand(n,n);

@btime g!(a,\$b,\$n) setup=(a=zeros(n, n)); # 110.865 μs (0 allocations: 0 bytes)
@btime h!(a,\$b,\$n) setup=(a=zeros(n, n)); # 162.373 μs (0 allocations: 0 bytes)

f! can be sped up a bit by using UnsafeViews.jl but it seems to still be slower than the copyto! solution.