Hey, community! I was comparing matrix multiplication performance using loops in C and Julia and I would like to know if someone could help me with one doubt about Matrix Acess behavior. I have read in the text “What Every Programmer Should Know About Memory” (pdf here, page 47 section 6) that it’s faster in C accessing matrix in one direction than another. Using the same example of the pdf, if we multiply two matrices “directly” it takes more time than transposing and multiplying:
Original (C code), takes 19.1 sec on my pc
for (i = 0; i < N; ++i)
for (j = 0; j < N; ++j)
for (k = 0; k < N; ++k)
res[i][j] += mul1[i][k] * mul2[k][j];
Transpose (C code), takes 4.8 sec on my pc
double tmp[N][N];
for (i = 0; i < N; ++i)
for (j = 0; j < N; ++j)
tmp[i][j] = mul2[j][i];
for (i = 0; i < N; ++i)
for (j = 0; j < N; ++j)
for (k = 0; k < N; ++k)
res[i][j] += mul1[i][k] * tmp[j][k];
In the other hand, when I compare with these two Julia functions, I have the opposite result:
Original, it takes 4.2 sec on my pc
function matrix_multip(N)
res = zeros(N,N); mul1 = rand(N,N); mul2 = rand(N,N);
finish = @elapsed begin
for i in 1:N
for j in 1:N
for k in 1:N
res[i,j] += mul1[i,k] * mul2[k,j];
end
end
end
end
return finish
end
Transpose, it takes 18.6 sec on my pc
function matrix_multip2(N)
res = zeros(N,N); mul1 = rand(N,N); mul2 = rand(N,N);
tmp = zeros(N,N);
finish = @elapsed begin
for i in 1:N
for j in 1:N
tmp[i,j] = mul2[j,i];
end
end
for i in 1:N
for j in 1:N
for k in 1:N
res[i,j] += mul1[i,k] * tmp[j,k];
end
end
end
end
return finish
end
Can anyone help me to understand this difference of behavior to access matrices? In all the cases, I have used N = 1000
. If anyone wants the full C code, I can reply with it
My version of Julia is
julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, haswell)
Thanks all!