# Blas dot doing weird things depending on size

I’m working on some by hand matrix multiplication.

``````function slowMult!(A::AbstractMatrix{T},B::AbstractMatrix{T}) where T<:Number
C = Matrix{T}(undef,size(A,1),size(B,2))
for i in 1:size(A,1)
for j in 1:size(B,2)
@inbounds @views C[i,j] = dot(A[i,:], B[:,j])
end
end
return C
end
``````

This is obviously not going to be the fastest approach, but what’s really weird is that calling this function on 2 `512` by `512` matrices is about 2x slower than on `514` by `514` matrices. The timing code is

``````T,N=Int,512
A = rand(T,N,N)
B = rand(T,N,N)
slowMult!(rand(T,1,1),rand(T,1,1)
C= @btime slowMult!(\$A,\$B)``````

Could be some alignment issue.

Incidentally, the first thing I would fix is memory access order:

https://docs.julialang.org/en/v1/manual/performance-tips/#Access-arrays-in-memory-order,-along-columns-1

Not sure this is the case here but looping over multiple arrays that has lengths that are a power of two can have bad effects on the cache:

http://scribblethink.org/Computer/cachekiller.html

3 Likes

Thanks so much! Now that I know about it, it should be easy enough to fix.