Memory allocation in matrix multiplication

aaraujo71 · July 8, 2020, 3:59am

I am using matrix multiplication inside large loops and I wonder how I can reduce memory allocation. For example, the following (useless) code

function run_cycle!(R, A, B)
    for i in 1:1000
        R = A*B
    end
end

function test_timing()
    A = rand(5, 5)
    B = rand(5, 5)
    R = Array{Float64,2}(undef, 5, 5)
    @time run_cycle!(R, A, B)
end

test_timing()

returns

0.000682 seconds (1000 allocations: 281.250 KiB)

Is there any way to eliminate this allocation? Before, in element-wise operations with matrices, I used to use macro @. to eliminate memory allocation. However, although allocation is eliminated, in this case @. R = A*B does something totally different.

Elrod · July 8, 2020, 4:13am

This works:

using LinearAlgebra
mul!(R, A, B)

If your arrays are small (I’ll use 5x5 as an example), then odds are a different library will give you better performance.
Two options include StaticArrays.jl, a popular choice that uses stack allocated arrays:

using StaticArrays, LinearAlgebra, BenchmarkTools
As = @SMatrix rand(5,5);
Bs = @SMatrix rand(5,5);
Aa = Array(As); Ba = Array(Bs); Ca = similar(Aa, size(Aa,1), size(Ba,2));

This yields:

julia> @btime As2 * $Bs setup = (As2 = $(Ref(As))[])
  21.886 ns (0 allocations: 0 bytes)
5×5 SArray{Tuple{5,5},Float64,2,25} with indices SOneTo(5)×SOneTo(5):
 1.15682   0.961188  1.08718   0.848822  1.10338
 0.584637  1.26248   0.896451  1.05197   0.892125
 1.22157   1.71552   1.4453    1.5288    1.28231
 1.40697   2.4277    1.80974   1.78083   2.07344
 0.963769  1.29204   1.19697   1.30232   1.07738

julia> @btime mul!($Ca, $Aa, $Ba)
  136.604 ns (0 allocations: 0 bytes)
5×5 Array{Float64,2}:
 1.15682   0.961188  1.08718   0.848822  1.10338
 0.584637  1.26248   0.896451  1.05197   0.892125
 1.22157   1.71552   1.4453    1.5288    1.28231
 1.40697   2.4277    1.80974   1.78083   2.07344
 0.963769  1.29204   1.19697   1.30232   1.07738

Because static arrays are stack allocated, the Julia compiler cheats the benchmark if you just do @btime $As * $Bs.

My library PaddedMatrices.jl isn’t well documented and is still missing a lot of features, but aims to be much faster than StaticArrays.jl:

using PaddedMatrices
Ap = FixedSizeMatrix{5,5,Float64}(undef) .= Aa;
Bp = FixedSizeMatrix{5,5,Float64}(undef) .= Ba;
Cp = FixedSizeMatrix{5,5,Float64}(undef);

yields

julia> @btime mul!($Cp, $Ap, $Bp)
  5.734 ns (0 allocations: 0 bytes)
5×5 FixedSizeArray{Tuple{5,5},Float64,2,Tuple{1,5},32,0,0,false} with indices 1:5×1:5:
 1.15682   0.961188  1.08718   0.848822  1.10338
 0.584637  1.26248   0.896451  1.05197   0.892125
 1.22157   1.71552   1.4453    1.5288    1.28231
 1.40697   2.4277    1.80974   1.78083   2.07344
 0.963769  1.29204   1.19697   1.30232   1.07738

aaraujo71 · July 8, 2020, 11:39am

thank you very much

Topic		Replies	Views
Help to reduce number of allocations General Usage	23	5489	January 6, 2019
Matrix of SVectors allocating Performance staticarrays	7	298	August 2, 2024
Inexplicable allocations when summing `StaticArrays` Performance	8	1126	December 3, 2018
Reduce allocations for matrix differential equations function Performance	10	1333	February 8, 2019
How to efficiently do a martix multiplication? New to Julia question	7	561	June 3, 2020

Memory allocation in matrix multiplication

Related topics