I have some functions involving large matrix multiplication in my code. A sample code is as follows.

function foo1()
temp = Array[]
for i = 1:1000
push!(temp, randn(400,400)*randn(400,1000))
end
end
function foo2()
for i = 1:1000
randn(400,400)*randn(400,1000)
end
end
@time foo1()
@time foo2()

I find that foo1() takes 17 seconds. Without push!, foo2() takes 11 seconds. Is there any way to improve the speed of foo1()? Thank you.

The eltype of temp is Array, which isn’t a concrete type: the compiler has to assume it can hold any subtype of Array. Use temp = Matrix{Float64}[] instead.

That shouldn’t have any measurable effect on the speed of this code. You are more likely to be simply observing the cost of actually allocating and populating larger and larger memory. foo2 is just creating garbage in a loop so the GC, the libc and the OS can work together to reuse the memory.

using LinearAlgebra, Random
using BenchmarkTools
function foo1()
temp = Array[]
for i = 1:1000
push!(temp, randn(400,400)*randn(400,1000))
end
end
function foo_prealloc(tmp::AbstractVector, A::AbstractMatrix, B::AbstractMatrix)
@assert size(A, 2) == size(B, 1) # multiplicable
@assert size(first(tmp)) == (size(A, 1), size(B, 2)) # fits into tmp
for t in tmp
randn!(A), randn!(B)
mul!(t, A, B)
end
return tmp
end
A = zeros(400,400)
B = zeros(400,1000)
tmp = [zeros(400,1000) for _ in 1:1000]
@btime foo1(); # 22.646 s (6010 allocations: 7.15 GiB)
@btime foo_prealloc($tmp, $A, $B); # 1.368 s (0 allocations: 0 bytes)