I benchmarked the following naive matrix multiplication algorithm in both Julia 0.6 and 1.0. Surprisingly, Julia 1.0 version runs 4 times slower. Doing @code_llvm
shows very neat output in 0.6 vs. 1.0. Any idea what is going wrong? Why doesn’t @simd
work well in 1.0?
The times are as follows:
0.351667 seconds (4 allocations: 15.259 MiB, 0.39% gc time) # Julia 0.6
1.454174 seconds (2 allocations: 7.629 MiB) # Julia 1.0
and the code:
using Compat
function matgen(n)
tmp = 1/n/n
[tmp * (i-j) * (i+j-2) for i = 1:n, j = 1:n]
end
function mul(a, b)
m,n = size(a)
q,p = size(b)
# transpose a for cache-friendliness
aT = transpose(a)
out = Array{Float64}(undef,m,p)
for i = 1:m
for j = 1:p
z = 0.0
@simd for k = 1:n
z += aT[k,i]*b[k,j]
end
out[i,j] = z
end
end
out
end
function main(n)
n = n÷2 * 2
a = matgen(n)
b = matgen(n)
@time c = mul(a, b)
v = n÷2 + 1
println(c[v, v])
end
main(200)
main(1000)
Thanks to @mohamed82008, this has been solved. transpose
is lazy in 1.0, so I needed to copy, otherwise the matrix aT
was same as a
and cash friendliness got destroyed.
For the record, here is the new timing (27% faster than 0.6):
0.276188 seconds (4 allocations: 15.259 MiB)