 # How can I improve the speed of large matrix multiplication

I have some functions involving large matrix multiplication in my code. A sample code is as follows.

``````function foo1()
temp = Array[]
for i = 1:1000
push!(temp, randn(400,400)*randn(400,1000))
end
end

function foo2()
for i = 1:1000
randn(400,400)*randn(400,1000)
end
end

@time foo1()
@time foo2()
``````

I find that `foo1()` takes 17 seconds. Without `push!`, `foo2()` takes 11 seconds. Is there any way to improve the speed of `foo1()`? Thank you.

1 Like

The eltype of `temp` is `Array`, which isn’t a concrete type: the compiler has to assume it can hold any subtype of `Array`. Use `temp = Matrix{Float64}[]` instead.

``````julia> isconcretetype(Array)
false

julia> isconcretetype(Matrix{Float64})
true
``````

That shouldn’t have any measurable effect on the speed of this code. You are more likely to be simply observing the cost of actually allocating and populating larger and larger memory. `foo2` is just creating garbage in a loop so the GC, the libc and the OS can work together to reuse the memory.

Hmm, I timed it before posting my original response and saw a significant effect, but that looks to have been a fluke.

@zxjroger, your benchmark is largely just timing `randn`:

``````julia> @btime randn(400,400) * randn(400,1000);
9.321 ms (6 allocations: 7.32 MiB)

julia> @btime a * b setup=(a=randn(400,400); b=randn(400,1000));
4.182 ms (2 allocations: 3.05 MiB)
``````

You can also preallocate the two random matrices outside the for loop and use randn!.

You can also use mul! instead of *.
Or use BLAS functions.

2 Likes

To expand on what @Juan said,

``````using LinearAlgebra, Random
using BenchmarkTools

function foo1()
temp = Array[]
for i = 1:1000
push!(temp, randn(400,400)*randn(400,1000))
end
end

function foo_prealloc(tmp::AbstractVector, A::AbstractMatrix, B::AbstractMatrix)
@assert size(A, 2) == size(B, 1) # multiplicable
@assert size(first(tmp)) == (size(A, 1), size(B, 2)) # fits into tmp
for t in tmp
randn!(A), randn!(B)
mul!(t, A, B)
end
return tmp
end

A = zeros(400,400)
B = zeros(400,1000)
tmp = [zeros(400,1000) for _ in 1:1000]

@btime foo1(); # 22.646 s (6010 allocations: 7.15 GiB)
@btime foo_prealloc(\$tmp, \$A, \$B); # 1.368 s (0 allocations: 0 bytes)
``````
1 Like

Thank you for all the suggestions. I will look into it.