In the following code the functions r_major
and c_major
iterate over array a
in the row and column major order:
using BenchmarkTools
function r_major!(a)
N1, N2 = size(a)
for i=1:N1
for j=1:N2
a[i,j] = 2
end
end
return nothing
end
function c_major!(a)
N1, N2 = size(a)
for j=1:N2
for i=1:N1
a[i,j] = 2
end
end
return nothing
end
N1, N2 = 400, 400
# N1, N2 = 80000, 2
# N1, N2 = 2, 80000
a = zeros((N1, N2))
@show N1, N2, N1 * N2
@btime r_major!($a)
@btime c_major!($a)
Depending on the shape of a
, I obtain the following results:
(N1, N2, N1 * N2) = (400, 400, 160000)
238.082 Ī¼s (0 allocations: 0 bytes)
34.344 Ī¼s (0 allocations: 0 bytes)
(N1, N2, N1 * N2) = (80000, 2, 160000)
128.841 Ī¼s (0 allocations: 0 bytes)
34.441 Ī¼s (0 allocations: 0 bytes)
(N1, N2, N1 * N2) = (2, 80000, 160000)
71.994 Ī¼s (0 allocations: 0 bytes)
100.357 Ī¼s (0 allocations: 0 bytes)
As expected, with array a
of size 400x400 the loop with column major order is much faster. With size 8000x2 (the total number of elements is the same), the column major order is still faster, but the time of the row major loop decreases twice, which is weird. However, when the size is 2x8000, the column major loop becomes slower than the row major loop. How is this possible if the number of elements does not change and only the shape of the array is changed?