Performance difference when accessing square matrix rows-first or cols-first

My explanation (which is the same as the one in the Julia manual) is perfectly consistent with this – accessing by column is faster because it corresponds to contiguous memory access, which is faster, and this difference becomes more important as the matrix gets larger and memory bandwidth dominates the runtime.

The timing comparison is even clearer if you pre-allocate the matrix, so that you aren’t timing the allocation itself. When you do that you’ll see that accessing by column is much faster even for relatively small matrices (until it gets so tiny that memory bandwidth ceases to be an issue). For example, I get:

m = 1
  7.042 ns (0 allocations: 0 bytes)
  7.023 ns (0 allocations: 0 bytes)
m = 10
  56.272 ns (0 allocations: 0 bytes)
  53.523 ns (0 allocations: 0 bytes)
m = 20
  187.415 ns (0 allocations: 0 bytes)
  69.617 ns (0 allocations: 0 bytes)
m = 50
  837.184 ns (0 allocations: 0 bytes)
  289.270 ns (0 allocations: 0 bytes)
m = 100
  8.792 μs (0 allocations: 0 bytes)
  1.211 μs (0 allocations: 0 bytes)
m = 1000
  1.443 ms (0 allocations: 0 bytes)
  256.103 μs (0 allocations: 0 bytes)
m = 10000
  931.414 ms (0 allocations: 0 bytes)
  62.150 ms (0 allocations: 0 bytes)

from

function matrix_0_1_rows(M)
    m, n = size(M)
    for i in 1:m
        for j in 1:n
            M[i,j] = 1
        end
    end
end

function matrix_0_1_cols(M)
    m, n = size(M)
    for j in 1:n
        for i in 1:m
            M[i,j] = 1
        end
    end
end

for m in [1, 10, 20, 50, 100, 1000, 10000]
    @show m
    M = zeros(m,m)
    @btime matrix_0_1_rows($M)
    @btime matrix_0_1_cols($M)
end
1 Like