How can I improve the speed of large matrix multiplication

zxjroger · March 5, 2020, 11:29pm

I have some functions involving large matrix multiplication in my code. A sample code is as follows.

function foo1()
    temp = Array[]
    for i = 1:1000
        push!(temp, randn(400,400)*randn(400,1000))
    end
end

function foo2()
    for i = 1:1000
        randn(400,400)*randn(400,1000)
    end
end

@time foo1()
@time foo2()

I find that foo1() takes 17 seconds. Without push!, foo2() takes 11 seconds. Is there any way to improve the speed of foo1()? Thank you.

stillyslalom · March 5, 2020, 11:41pm

The eltype of temp is Array, which isn’t a concrete type: the compiler has to assume it can hold any subtype of Array. Use temp = Matrix{Float64}[] instead.

julia> isconcretetype(Array)
false

julia> isconcretetype(Matrix{Float64})
true

yuyichao · March 5, 2020, 11:58pm

That shouldn’t have any measurable effect on the speed of this code. You are more likely to be simply observing the cost of actually allocating and populating larger and larger memory. foo2 is just creating garbage in a loop so the GC, the libc and the OS can work together to reuse the memory.

stillyslalom · March 6, 2020, 12:18am

Hmm, I timed it before posting my original response and saw a significant effect, but that looks to have been a fluke.

@zxjroger, your benchmark is largely just timing randn:

julia> @btime randn(400,400) * randn(400,1000);
  9.321 ms (6 allocations: 7.32 MiB)

julia> @btime a * b setup=(a=randn(400,400); b=randn(400,1000));
  4.182 ms (2 allocations: 3.05 MiB)

Juan · March 6, 2020, 12:49am

You can also preallocate the two random matrices outside the for loop and use randn!.

You can also use mul! instead of *.
Or use BLAS functions.

carstenbauer · March 7, 2020, 9:10am

To expand on what @Juan said,

using LinearAlgebra, Random
using BenchmarkTools

function foo1()
    temp = Array[]
    for i = 1:1000
        push!(temp, randn(400,400)*randn(400,1000))
    end
end

function foo_prealloc(tmp::AbstractVector, A::AbstractMatrix, B::AbstractMatrix)
    @assert size(A, 2) == size(B, 1) # multiplicable
    @assert size(first(tmp)) == (size(A, 1), size(B, 2)) # fits into tmp
    for t in tmp
        randn!(A), randn!(B)
        mul!(t, A, B)
    end
    return tmp
end


A = zeros(400,400)
B = zeros(400,1000)
tmp = [zeros(400,1000) for _ in 1:1000]

@btime foo1(); # 22.646 s (6010 allocations: 7.15 GiB)
@btime foo_prealloc($tmp, $A, $B); # 1.368 s (0 allocations: 0 bytes)

zxjroger · March 10, 2020, 5:17pm

Thank you for all the suggestions. I will look into it.

Topic		Replies	Views
How to improve the speed of matrix multiplication? Performance question	10	1872	January 14, 2021
How to make faster in a matrix multiplication inside a loop? Performance	2	453	July 27, 2020
Matrix multiplication is slower when multithreading in Julia Performance question , multithreading , linearalgebra	13	4134	January 21, 2022
Memory allocation in matrix multiplication Performance	2	2107	July 8, 2020
Comparing matrix multiplication between Julia and C Performance question , matrices	2	890	July 7, 2020

How can I improve the speed of large matrix multiplication

Related topics