Comparing matrix multiplication between Julia and C

saulo.giovani · July 7, 2020, 10:27pm

Hey, community! I was comparing matrix multiplication performance using loops in C and Julia and I would like to know if someone could help me with one doubt about Matrix Acess behavior. I have read in the text “What Every Programmer Should Know About Memory” (pdf here, page 47 section 6) that it’s faster in C accessing matrix in one direction than another. Using the same example of the pdf, if we multiply two matrices “directly” it takes more time than transposing and multiplying:

Original (C code), takes 19.1 sec on my pc

	for (i = 0; i < N; ++i)
		for (j = 0; j < N; ++j)
			for (k = 0; k < N; ++k)
				res[i][j] += mul1[i][k] * mul2[k][j];

Transpose (C code), takes 4.8 sec on my pc

	double tmp[N][N];
	for (i = 0; i < N; ++i)
		for (j = 0; j < N; ++j)
			tmp[i][j] = mul2[j][i];

	for (i = 0; i < N; ++i)
		for (j = 0; j < N; ++j)
			for (k = 0; k < N; ++k)
				res[i][j] += mul1[i][k] * tmp[j][k];

In the other hand, when I compare with these two Julia functions, I have the opposite result:

Original, it takes 4.2 sec on my pc

function matrix_multip(N)
    res = zeros(N,N); mul1 = rand(N,N); mul2 = rand(N,N);
    
    finish = @elapsed begin
        for i in 1:N
            for j in 1:N
                for k in 1:N
                    res[i,j] += mul1[i,k] * mul2[k,j];
                end
            end
        end
    end
    return finish
end

Transpose, it takes 18.6 sec on my pc

function matrix_multip2(N)
    res = zeros(N,N); mul1 = rand(N,N); mul2 = rand(N,N);
    tmp = zeros(N,N);
    
    finish = @elapsed begin
        for i in 1:N
            for j in 1:N
                tmp[i,j] = mul2[j,i];
            end
        end
        for i in 1:N
            for j in 1:N
                for k in 1:N
                    res[i,j] += mul1[i,k] * tmp[j,k];
                end
            end
        end
    end
    
    return finish
end

Can anyone help me to understand this difference of behavior to access matrices? In all the cases, I have used N = 1000. If anyone wants the full C code, I can reply with it

My version of Julia is

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Thanks all!

tkoolen · July 7, 2020, 10:33pm

Yup, C is row-major, Julia is column-major. Row- and column-major order - Wikipedia

saulo.giovani · July 7, 2020, 10:38pm

Thanks a lot!