Vector Multiplication on Submatrices

willsharpless · October 13, 2022, 7:18pm

If I have a matrix which is a (horizontal) collection of matrices,

A := \begin{bmatrix} A_0 & A_1 & A_2 & \dots & A_t \end{bmatrix} , \quad A_i \in \mathbb{R}^{n \times n}, A \in \mathbb{R}^{n \times tn}

and I want to do vector multiplication on each submatrix, i.e., compute

Av := \begin{bmatrix} A_0v & A_1v & \dots & A_tv \end{bmatrix} , \quad v \in \mathbb{R}^{n}, Av \in \mathbb{R}^{n \times t}

then what is the fastest (and cleanest if equally fast) way to do this?

I have three naive ways that work below and a plot of their results from BenchmarkTools, but the performance varies a ton between small and large dimension n. This leaves me feeling like there is a better solution.

using LinearAlgebra

t = 500
n = 3 # results below are with n = 3, 10, & 300

A, v = rand(n,n*t), rand(n)

foo1(A, v) = (A .* v)[:, 1:n:end]

foo2(A,v) = view(A .* v, :, 1:n:n*t)

function foo3(A,v)
		Av = zeros(n, t)
		for i in axes(Av,2)
		    Av[:,i] = view(A, :, (i-1)*n + 1 : i*n) * v;
		end
		return Av
end

stevengj · October 13, 2022, 8:27pm

Some options:

Store it as a vertical concatenation of matrices, not horizontal. Then you can just do A*v to get a vertical concatenation of the results.
If you are really talking about n=3, i.e. a collection of 3x3 matrices and 3-component vectors, than you might want to consider using StaticArrays.jl. Make A a 1d array of 3x3 SMatrix elements, and make v a 3-component SVector. Then do A .* Ref(v) to multiply each 3x3 matrix times v.
You might want to take a more “global” view of your approach. I’m always a bit suspicious when someone is optimizing such a simple operation, and worry that they are taking a classic Matlab/Numpy vectorized approach, which is often suboptimal. It’s often better to combine many operations into a single loop rather than trying to optimize a bunch of simple “vector” operations separately. Loops can be fast in Julia!

willsharpless · October 13, 2022, 9:08pm

Thank you @stevengj. I was so buried in my method I didn’t even think of 1.

I am certainly guilty of 3. often but not this time!

bertschi · October 13, 2022, 9:25pm

First, your functions foo1 and foo2 don’t seem to be doing what you want, but instead use pointwise multiplication and then subselect some elements:

using LinearAlgebra
n = 3; t = 2;
A = hcat(I(3), 2 .* I(3))
v = [1,2,3]

julia> foo1(A, v)
3×2 Matrix{Int64}:
 1  2
 0  0
 0  0

julia> foo2(A, v)
3×2 view(::Matrix{Int64}, :, 1:3:4) with eltype Int64:
 1  2
 0  0
 0  0

julia> foo3(A, v)
3×2 Matrix{Float64}:
 1.0  2.0
 2.0  4.0
 3.0  6.0

As alternatives for the last version, @stevengj already suggested A * v with A \in \mathbb{R}^{tn \times n}. As Julia is column-major it might be faster to use the first dimensions for the multiplication, i.e., store the transposed matrices A_i horizontally and then use v' * A.
Further, if A is stored as a 3-tensor, i.e., A \in \mathbb{R}^{n \times n \times t}, several libraries allowing for a split-apply-combine strategy can be used, e.g., my JJ.jl:

A = rand(n, n, t)
v = rand(n)
rank"2 * 1"(A, v)

kfrb · October 14, 2022, 10:14am

I made some tests with the 2. option of @stevengj, which seems very promising for small values of n.

I assume that only foo3 is doing what you want, so we compare these two functions:

function foo3(A, v, n, t)
    Av = zeros(n, t)
    for i in axes(Av, 2)
        Av[:, i] = view(A, :, (i - 1) * n + 1 : i * n) * v;
    end
    return Av
end

foo4(A, v) = A .* Ref(v)

Note that foo3 here uses n and t as inputs, not as globals.
The function foo4 needs a different matrix A2 and vector v2:

function get_A2(A, n, t)
    A2 = Vector{SMatrix{n, n, Float64, n^2}}(undef, t)
    for i in eachindex(A2)
        A2[i] = SMatrix{n,n}(A[:, (i - 1) * n + 1 : i * n])
    end
    return A2
end

A1, v1 = rand(n, n * t), rand(n)
A2, v2 = get_A2(A1, n, t), SVector{n}(v1)

julia> typeof(A2)
Vector{SMatrix{3, 3, Float64, 9}} (alias for Array{SArray{Tuple{3, 3}, Float64, 2, 9}, 1})

julia> typeof(v2)
SVector{3, Float64} (alias for SArray{Tuple{3}, Float64, 1, 3})

Note that foo4 returns a Vector{SVector{n, Float64}} and not a Matrix{Float64} like foo3!
A conversion can be done with a function like this:

function convert_to_matrix(Av, n, t)
    Av_matrix = zeros(n, t)
    for i in axes(Av_matrix, 2)
        Av_matrix[:, i] = Av[i]
    end
    return Av_matrix
end

Benchmark results:

function benchmark(n::Int, t::Int=500)
    A1, v1 = rand(n, n * t), rand(n)
    A2, v2 = get_A2(A1, n, t), SVector{n}(v1)
    @btime foo3($A1, $v1, $n, $t)
    @btime foo4($A2, $v2)
    @test foo3(A1, v1, n, t) ≈ convert_to_matrix(foo4(A2, v2), n, t)
end

julia> benchmark(3)
  40.797 μs (501 allocations: 50.94 KiB)
  1.486 μs (1 allocation: 11.88 KiB)
Test Passed

julia> benchmark(10)
  58.837 μs (502 allocations: 109.42 KiB)
  8.749 μs (2 allocations: 39.11 KiB)
Test Passed

julia> benchmark(40)
  176.178 μs (502 allocations: 351.61 KiB)
  140.093 μs (2 allocations: 156.30 KiB)
Test Passed

julia> benchmark(50)
  284.541 μs (502 allocations: 437.55 KiB)
  335.238 μs (2 allocations: 195.36 KiB)
Test Passed

Complete MWE

using LinearAlgebra, StaticArrays, BenchmarkTools, Test

function foo3(A, v)
    Av = zeros(n, t)
    for i in axes(Av, 2)
        Av[:, i] = view(A, :, (i - 1) * n + 1 : i * n) * v;
    end
    return Av
end

foo4(A, v) = A .* Ref(v)

function get_A2(A, n, t)
    A2 = Vector{SMatrix{n, n, Float64, n^2}}(undef, t)
    for i in eachindex(A2)
        A2[i] = SMatrix{n,n}(A[:, (i - 1) * n + 1 : i * n])
    end
    return A2
end

function convert_to_matrix(Av, n, t)
    Av_ = zeros(n, t)
    for i in axes(Av_, 2)
        Av_[:, i] = Av[i]
    end
    return Av_
end

function benchmark(n::Int, t::Int=500)
    A1, v1 = rand(n, n * t), rand(n)
    A2, v2 = get_A2(A1, n, t), SVector{n}(v1)
    @btime foo3($A1, $v1, $n, $t)
    @btime foo4($A2, $v2)
    @test foo3(A1, v1, n, t) ≈ convert_to_matrix(foo4(A2, v2), n, t)
end

benchmark(3)
benchmark(10)
benchmark(40)
benchmark(50)

stevengj · October 14, 2022, 12:55pm

This will not get the full benefit of StaticArrays. You need n to be a compile-time constant, e.g. passing it via Val, or by the length of an SVector v

kfrb · October 14, 2022, 2:58pm

Thank you @stevengj for your reply!

What would be the correct solution with Val? Unfortunately, I am not able to get a further performance gain.

function get_A2(A::Matrix{T}, n, t, ::Val{N}) where {N, T}
    A2 = Vector{SMatrix{N, N, Float64, N^2}}(undef, t)
    for i in eachindex(A2)
        A2[i] = SMatrix{N, N}(A[:, (i - 1) * n + 1 : i * n])
    end
    return A2
end

function get_v2(x::Vector{T}, ::Val{N}) where {T,N}
    return SVector{N, T}(x)
end

function benchmark(n::Int, t::Int=500)
    A1, v1 = rand(n, n * t), rand(n)
    A2, v2 = get_A2(A1, n, t, Val(n)), get_v2(v1, Val(n))
    @btime foo3($A1, $v1, $n, $t)
    @btime foo4($A2, $v2)
    @test foo3(A1, v1, n, t) ≈ convert_to_matrix(foo4(A2, v2), n, t)
end

julia> benchmark(3)
  40.616 μs (501 allocations: 50.94 KiB)
  1.485 μs (1 allocation: 11.88 KiB)
Test Passed

willsharpless · October 14, 2022, 3:12pm

foo3 is actually pretty slow compared to foo1/2 for small dim. However, vertical storing + broadcasting + reshape() surpasses all the others I tried for small and large dimension (n ∈ [1, 300]), although I did not try StaticArrays.

mikmoore · October 14, 2022, 3:46pm

If we’re applying some minimal effort to be efficient here, one should avoid the unnecessary temporaries with this alternative to foo3:

using LinearAlgebra

function foo3a(A, v, n, t)
    Av = similar(A, (n, t))
    for i in axes(Av, 2)
        @views mul!(Av[:, i], A[:, (i - 1) * n + 1 : i * n], v);
    end
    return Av
end

Although at small sizes, there’s still no beating StaticArrays.

stevengj · October 14, 2022, 10:29pm

Oh, since you’re interpolating into your @btime code it should be fine. In an actual application, however, you’d want to make sure that the size is set statically as early as possible.

Topic		Replies	Views
Multiply many-matrices by many-vectors Performance matlab , parallel , multithreading , tensors	33	6735	December 14, 2018
Matrix Multiplication of a Large Number of Small Matrices Performance linearalgebra , tullio , column-major	12	1302	February 8, 2023
Using a function on each row of a matrix and getting a similar matrix back General Usage function , staticarrays , map	13	1000	January 24, 2024
How to efficiently multiply every matrix in a vector of matrices by another vector? Performance array , arrays , vector , matrices	3	662	February 15, 2023
Fast small integer matrix multiplication Performance	11	480	September 9, 2022

Vector Multiplication on Submatrices

Benchmark results:

Related topics