I understand that Julia stores arrays in column-major order which means that columns are stacked onto one another. Thus, adjacent rows in the same column are adjacent in memory.
If that is the case, I am failing to see why the following operation is faster in row-form and makes less alocations rather than in column form:
using BenchmarkTools
A = rand(5, 5);
B = rand(5, 5);
C = Array{Float64}(undef, 5, 5);
function modifyC0!(A, B, i)
C[i, :] .= A[i, :] .* B[i, :]
end
@btime modifyC0!(A, B, 2);
# 333.318 ns (5 allocations: 352 bytes)
function modifyC1!(A, B, i)
C[:, i] .= A[:, i] .* B[:, i]
end
@btime modifyC1!(A, B, 2);
# 361.809 ns (6 allocations: 368 bytes)
Shouldn’t they be the same in terms of allocation and modifyC1!
at marginally better in terms of time.
The situation is as expected when I do not broadcast the final allocation to C. However, in this case it feels like one allocation is missing (since C is 5x5).
function modifyC2!(A, B, i)
C[i, :] = A[i, :] .* B[i, :]
end
@btime modifyC2!(A, B, 2);
# 487.897 ns (4 allocations: 400 bytes)
function modifyC3!(A, B, i)
C[:, i] = A[:, i] .* B[:, i]
end
@btime modifyC3!(A, B, 2);
# 446.896 ns (4 allocations: 400 bytes)