This works:
using LinearAlgebra
mul!(R, A, B)
If your arrays are small (I’ll use 5x5 as an example), then odds are a different library will give you better performance.
Two options include StaticArrays.jl
, a popular choice that uses stack allocated arrays:
using StaticArrays, LinearAlgebra, BenchmarkTools
As = @SMatrix rand(5,5);
Bs = @SMatrix rand(5,5);
Aa = Array(As); Ba = Array(Bs); Ca = similar(Aa, size(Aa,1), size(Ba,2));
This yields:
julia> @btime As2 * $Bs setup = (As2 = $(Ref(As))[])
21.886 ns (0 allocations: 0 bytes)
5×5 SArray{Tuple{5,5},Float64,2,25} with indices SOneTo(5)×SOneTo(5):
1.15682 0.961188 1.08718 0.848822 1.10338
0.584637 1.26248 0.896451 1.05197 0.892125
1.22157 1.71552 1.4453 1.5288 1.28231
1.40697 2.4277 1.80974 1.78083 2.07344
0.963769 1.29204 1.19697 1.30232 1.07738
julia> @btime mul!($Ca, $Aa, $Ba)
136.604 ns (0 allocations: 0 bytes)
5×5 Array{Float64,2}:
1.15682 0.961188 1.08718 0.848822 1.10338
0.584637 1.26248 0.896451 1.05197 0.892125
1.22157 1.71552 1.4453 1.5288 1.28231
1.40697 2.4277 1.80974 1.78083 2.07344
0.963769 1.29204 1.19697 1.30232 1.07738
Because static arrays are stack allocated, the Julia compiler cheats the benchmark if you just do @btime $As * $Bs
.
My library PaddedMatrices.jl
isn’t well documented and is still missing a lot of features, but aims to be much faster than StaticArrays.jl
:
using PaddedMatrices
Ap = FixedSizeMatrix{5,5,Float64}(undef) .= Aa;
Bp = FixedSizeMatrix{5,5,Float64}(undef) .= Ba;
Cp = FixedSizeMatrix{5,5,Float64}(undef);
yields
julia> @btime mul!($Cp, $Ap, $Bp)
5.734 ns (0 allocations: 0 bytes)
5×5 FixedSizeArray{Tuple{5,5},Float64,2,Tuple{1,5},32,0,0,false} with indices 1:5×1:5:
1.15682 0.961188 1.08718 0.848822 1.10338
0.584637 1.26248 0.896451 1.05197 0.892125
1.22157 1.71552 1.4453 1.5288 1.28231
1.40697 2.4277 1.80974 1.78083 2.07344
0.963769 1.29204 1.19697 1.30232 1.07738