Hey,

I observed that on my machine matlab’s matmul faster than julia’s. I am quite confused since it seems I have already aligned all the different factors.

Here is a minimal working example.

Julia code:

```
using MKL
using SparseArrays
using BenchmarkTools
using MAT
MKL.BLAS.set_num_threads(1)
n = 10^3
A = randn(n, n)
B = randn(n, n)
@btime $A * $B;
matwrite(homedir()*"/matmul_data.mat",
Dict("A" => A, "B" => B))
```

whose output is

```
46.958 ms (2 allocations: 7.63 MiB)
```

and matlab code:

```
function benchmark(reps)
setenv("MKL_NUM_THREADS", "1");
setenv("OMP_NUM_THREADS", "1");
setenv("OPENBLAS_NUM_THREADS", "1");
data = load('~/matmul_data.mat');
A = data.A;
B = data.B;
n = size(A, 1);
times = zeros(1, reps);
prod = randn(n, n);
for i = 1:reps
tic; % Start timing
C = A * B; % Perform dense-dense matrix multiplication
times(i) = toc; % Stop timing and record the duration
D = randn(n, n);
prod = prod * (C + D); % avoid C being optimized away
end
averageTime = mean(times);
fprintf('Average execution time: %.4f seconds\n', averageTime);
end
```

Running `benchmark(100)`

gives me

```
Average execution time: 0.0081 seconds
```

which is about 5-6x faster than julia. Is there any step I omitted such that the comparison isn’t fair?