# Matrix multiplication

Hi there,

I need to perform many matrix multiplications as follows. Here I proposed two functions (f1 and f2). I am wondering whether we could improve the speed of the function f2 further? Thanks in advance!

``````using LinearAlgebra
f1(C,X,Y,Z,T,x) = C .= .-x[1].*X*X' .- x[2].*Y*Y' .+ x[3].*Z*T;
function f2(C,X,Y,Z,T,x)
x1 = -x[1]
x2 = -x[2]
x3 = x[3]
fill!(C,0.0)
BLAS.syrk!('L', 'N', x1, X, 1.0, C)
BLAS.syrk!('L', 'N', x2, Y, 1.0, C)
conSym(C)
mul!(C, Z, T, x3, 1.0)
end
function conSym(A)
@inbounds for ii=1:size(A,1)-1
for jj=ii+1:size(A,1)
A[ii,jj] = A[jj,ii]
end
end
end
v = 600;
r = 3;
b1 = 60;
b2 = 90;
x = rand(3);
C1 = zeros(Float64,v,v);
C2 = zeros(Float64,v,v);
C3 = zeros(Float64,v,v);

X = rand(v,b1);
Y = rand(v,b2);
Z = rand(v,v*r);
T = rand(v*r,v) ;
@time f1(C1,X,Y,Z,T,x);
@time f2(C2,X,Y,Z,T,x);

@test all(C1 .β C2);
``````

Are you aware that you are measuring the compilation time there as well? Add a call to `f2` before that and you will get a better time estimate:

``````f2(C2,X,Y,Z,T,x);
@time f2(C2,X,Y,Z,T,x);
0.012986 seconds

``````

(instead of ` 0.063080 seconds (36.77 k allocations: 1.934 MiB)`, that you get in the first call)

Or even better, use `BenchmarkTools`.

1 Like

You should read the Performance Tips section in the docs. For example, you are using global variables which affect type stability and the `@time` macro called once will also include the time needed for the first compilation. Use `@btime` from `BenchmarkTools` instead.

3 Likes

Are you aware that you are measuring the compilation time there as well? Add a call to `f2` before that and you will get a better time estimate:

Yes, I am aware of that. I actually donβt compare the computing times of these functions for a first call. Also, the function f1 is just a way that I make sure that the calculation of function f2 is correct.

(instead of `0.063080 seconds (36.77 k allocations: 1.934 MiB)`, that you get in the first call)
Or even better, use `BenchmarkTools`.

Indeed, I actually used the command `@btime`. But somehow I got trouble when using this command `@test all(C1 .β C2)`. This is because I thought that these commands `@btime f1(\$C1,\$X,\$Y,\$Z,\$T,\$x); @btime f2(\$C2,\$X,\$Y,\$Z,\$T,\$x);` will not change variables `C1` and `C2`. Therefore, I used these commands `@time f1(C1,X,Y,Z,T,x); @time f2(C2,X,Y,Z,T,x);` in the original post so that my codes will run smoothly. But probably, I am wrong here. I am sorry about this. Nevertheless, this is computing times from my computer:

``````@btime f1(\$C1,\$X,\$Y,\$Z,\$T,\$x);
23.513 ms (12 allocations: 17.17 MiB)
@btime f2(\$C2,\$X,\$Y,\$Z,\$T,\$x);
17.317 ms (0 allocations: 0 bytes)
``````

As you can see, the function f2 is not much faster than function f1 as I expected. I had an idea to write a loop. Since the dimensions of matrices X, Y, Z, and T are different, it is not straightforward to do that. So, do you have any suggestions? Thanks in advance!

To benchmark that with `BenchmarkTools` you should use:

``````setup = (
C3 = zeros(Float64,v,v);
Xin = X;
Yin = Y;
Zin = Z;
Tin = T;
)
@btime f2(C3,Xin,Yin,Zin,Tin,\$x) setup=setup evals=1
``````

On what remains, no, I do not see any obvious change to accelerate that code.

1 Like

To benchmark that with `BenchmarkTools` you should use:

Maybe less obvious. Depending on the application, one could switch to `Float32`, which should speed things up.