I have a nxn nested array of 2x2 matrices A, and a n-vector of 2-vectors x, I would like to perform a matrix vector multiplication of the form y=Ax, where y[i] = \sum_j A[i,j]*x[j], where the last * should be a matrix -vector multiplication. If I concatenate all the 2x2 matrices in A into a 2nx2n matrix, and all 2-vectors in x into a 2n-vector, then the above multiplication reduces to the basic matrix-vector multiplication.

What is the most natural way to implement this product? I tried reinterpret on x but got a 2-d array, and reinterpret on A gave a 3-d array with the 2x2 matrices flattened. I don’t want to mess with indices. I hope the code should look naturally like math equations. I also hope there is no creation of new temp arrays. In other words, I hope A, x, y maintain their current form of nested arrays.

In other words, I want to perform linear algebra over space of 2-vectors.

The generic method for matrix multiplication works already in that case:

julia> using StaticArrays
julia> A = [SMatrix{2, 2}(1, 2, 3, 4) for i in 1:3, j in 1:3];
julia> v = [SVector(1, 2) for i in 1:3];
julia> A * v
3-element Vector{SVector{2, Int64}}:
[21, 30]
[21, 30]
[21, 30]

The other approach (reshaping into 2n x 2n matrix and 2n vector) may be more efficient, because it calls an optimized BLAS routine (I haven’t benchmarked the difference). However, you would need to create new temp arrays of that shape (the memory layout is different). It would looks something like:

julia> to_vector(nested::AbstractVector{T}) where {T <: StaticVector} = vec(reinterpret(reshape, eltype(T), nested))
to_vector (generic function with 1 method)
julia> function to_matrix(nested::AbstractMatrix{T}) where T <: StaticMatrix
sz_in, sz_out = size(T), size(nested)
sz = map(*, sz_in, sz_out)
m0 = reinterpret(reshape, eltype(T), nested)
m1 = reshape(m0, sz_in..., sz_out...)
m2 = permutedims(m1, (1, 3, 2, 4))
return reshape(m2, sz)
end

where the permutedims step is allocating, but I don’t think there is a way to avoid that.

Thank you for the answer. I did a simple benchmark and found that for outer size of A in the order of 1000x1000 ish, the reshaped array works about 2x faster, less so for smaller sizes.