# Would matrix multiplication benefit if Julia allowed for row-major arrays as well?

Just a random thought. Consider multiplication of three matrices: `A * (B * C)`

If `A` is column-major, would the execution times of the outer product be substantially smaller (if the size of the product `B * C` and matrix `A` exceed CPU cache size) if Julia would optimize the product `(B * C)` as a row-major temporary matrix?

No. The blocking performed by an optimized BLAS implementation mostly eliminates the difference between row-major and column-major memory layouts. You can easily test this for yourself, because a transposed column-major array is equivalent to a row-major array, and so doing `A*B'` in Julia is equivalent to column-major × row-major. (Julia evaluates `A*B'` by calling a BLAS routine that operates in-place on `A` and `B`.)

``````julia> using BenchmarkTools

julia> A = rand(1000,1000); B = copy(A);

julia> f(a,b) = a*b
f (generic function with 1 method)

julia> g(a,b) = a'*b
g (generic function with 1 method)

julia> h(a,b) = a*b'
h (generic function with 1 method)

julia> @btime f(\$A,\$B) evals=1;
13.416 ms (2 allocations: 7.63 MiB)

julia> @btime g(\$A,\$B) evals=1;
13.364 ms (2 allocations: 7.63 MiB)

julia> @btime h(\$A,\$B) evals=1;
13.560 ms (2 allocations: 7.63 MiB)
``````

The timing differences here are less than 2%, which is probably within the noise.

Of course, there are other reasons to support row-major formats, mainly to pass data to/from external libraries without making copies, but this can be done by add-on packages. (e.g. PyCall already supports this to provide copy-free views of row-major NumPy arrays.)

10 Likes