Matrix multiplication

Hi there,

I need to perform many matrix multiplications as follows. Here I proposed two functions (f1 and f2). I am wondering whether we could improve the speed of the function f2 further? Thanks in advance!

using LinearAlgebra
f1(C,X,Y,Z,T,x) = C .= .-x[1].*X*X' .- x[2].*Y*Y' .+ x[3].*Z*T;
function f2(C,X,Y,Z,T,x)
    x1 = -x[1]
    x2 = -x[2]
    x3 = x[3]
    fill!(C,0.0)    
    BLAS.syrk!('L', 'N', x1, X, 1.0, C)
    BLAS.syrk!('L', 'N', x2, Y, 1.0, C)
    conSym(C)
    mul!(C, Z, T, x3, 1.0)
end
function conSym(A)
   @inbounds for ii=1:size(A,1)-1
           for jj=ii+1:size(A,1)
             A[ii,jj] = A[jj,ii]   
           end
        end
end
v = 600;
r = 3;
b1 = 60;
b2 = 90;
x = rand(3);
C1 = zeros(Float64,v,v);
C2 = zeros(Float64,v,v);
C3 = zeros(Float64,v,v);

X = rand(v,b1);
Y = rand(v,b2);
Z = rand(v,v*r);
T = rand(v*r,v) ;         
@time f1(C1,X,Y,Z,T,x);
@time f2(C2,X,Y,Z,T,x);

@test all(C1 .β‰ˆ C2);

Are you aware that you are measuring the compilation time there as well? Add a call to f2 before that and you will get a better time estimate:

f2(C2,X,Y,Z,T,x);
@time f2(C2,X,Y,Z,T,x);
  0.012986 seconds

(instead of 0.063080 seconds (36.77 k allocations: 1.934 MiB), that you get in the first call)

Or even better, use BenchmarkTools.

1 Like

You should read the Performance Tips section in the docs. For example, you are using global variables which affect type stability and the @time macro called once will also include the time needed for the first compilation. Use @btime from BenchmarkTools instead.

3 Likes

Are you aware that you are measuring the compilation time there as well? Add a call to f2 before that and you will get a better time estimate:

Yes, I am aware of that. I actually don’t compare the computing times of these functions for a first call. Also, the function f1 is just a way that I make sure that the calculation of function f2 is correct.

(instead of 0.063080 seconds (36.77 k allocations: 1.934 MiB), that you get in the first call)
Or even better, use BenchmarkTools.

Indeed, I actually used the command @btime. But somehow I got trouble when using this command @test all(C1 .β‰ˆ C2). This is because I thought that these commands @btime f1($C1,$X,$Y,$Z,$T,$x); @btime f2($C2,$X,$Y,$Z,$T,$x); will not change variables C1 and C2. Therefore, I used these commands @time f1(C1,X,Y,Z,T,x); @time f2(C2,X,Y,Z,T,x); in the original post so that my codes will run smoothly. But probably, I am wrong here. I am sorry about this. Nevertheless, this is computing times from my computer:

@btime f1($C1,$X,$Y,$Z,$T,$x);
  23.513 ms (12 allocations: 17.17 MiB)
@btime f2($C2,$X,$Y,$Z,$T,$x);
  17.317 ms (0 allocations: 0 bytes)

As you can see, the function f2 is not much faster than function f1 as I expected. I had an idea to write a loop. Since the dimensions of matrices X, Y, Z, and T are different, it is not straightforward to do that. So, do you have any suggestions? Thanks in advance!

To benchmark that with BenchmarkTools you should use:

setup = (
C3 = zeros(Float64,v,v);
Xin = X;
Yin = Y;
Zin = Z;
Tin = T;
)
@btime f2(C3,Xin,Yin,Zin,Tin,$x) setup=setup evals=1

On what remains, no, I do not see any obvious change to accelerate that code.

1 Like

To benchmark that with BenchmarkTools you should use:

Yes, thanks for your tip!

Maybe less obvious. Depending on the application, one could switch to Float32, which should speed things up.