StaticArrays.jl has received a few cool new features since its 0.12.4 release that are now fast thanks to the memory layout changes in Julia 1.5. These changes are commonly known as non-allocating views but as I show here it’s just the tip of the iceberg .
Here is the outline to make this post a bit more digestible:
- A cool application.
- Sized abstract arrays.
- Faster matrix multiplication.
- Loading time reduction.
These things may not be easy to find so I decided to write about it here.
1. A cool application
Diffusion tensor MRI produces arrays of 3x3 symmetric positive definite (SPD) matrices. Analyzing such data is one possible application of Manifolds.jl . There are multiple operations that can be performed on such data, and one of the is the so-called logarithmic map in the linear-affine metric. It takes two NxN SPD matrices and returns a vector that describes how to get from the first one to the second one. Here is the code that performs this operation:
function log!(M::SymmetricPositiveDefinite{N}, X, p, q) where {N}
e = eigen(Symmetric(p))
U = e.vectors
S = e.values
Ssqrt = Diagonal(sqrt.(S))
SsqrtInv = Diagonal(1 ./ sqrt.(S))
pSqrt = Symmetric(U * Ssqrt * transpose(U))
pSqrtInv = Symmetric(U * SsqrtInv * transpose(U))
T = Symmetric(pSqrtInv * q * pSqrtInv)
e2 = eigen(T)
Se = Diagonal(log.(max.(e2.values, eps())))
pUe = pSqrt * e2.vectors
return mul!(X, pUe, Se * transpose(pUe))
end
Now we may want to compute that for each pair of corresponding matrices from the two arrays we get from DT-MRI. How many allocations would that take? Just one, for the result. Now, how is that possible with so much linear algebra going on? Let’s take a look.
First, we need to somehow tell StaticArrays that we are dealing with static 3x3 matrices, but a lot of them. Traditionally one would make, for example, a 128x128 array of 3x3 SArray
s but that’s, from my experience, very inconvenient to work with. Instead, we can use HybridArrays.jl to make a 3x3x128x128 array with the first two axes statically sized and the other two dynamically sized. With that we can easily access the 3x3 parts and have just a normal SMatrix
. That’s for reading from it but we also need to write the result somehow. That’s where sized abstract arrays come.
2. Sized abstract arrays
SizedArray
was generalized to support adding size information to arbitrary AbstractArray
s. For example you can make a sized view. How? Just call view
! As long as the size can be figured out from types, the result will be a SubArray
wrapped in SizedArray
. And that won’t allocate on Julia 1.5+.
julia> using HybridArrays, StaticArrays, BenchmarkTools
julia> x = HybridArray{Tuple{3,3,StaticArrays.Dynamic(),StaticArrays.Dynamic()}}(randn(3,3,128,128));
julia> xv = view(x, :, :, 1, 2)
3×3 SizedArray{Tuple{3,3},Float64,2,2,SubArray{Float64,2,HybridArray{Tuple{3,3,StaticArrays.Dynamic(),StaticArrays.Dynamic()},Float64,4,4,Array{Float64,4}},Tuple{Base.Slice{SOneTo{3}},Base.Slice{SOneTo{3}},Int64,Int64},true}} with indices SOneTo(3)×SOneTo(3):
0.453334 -0.202884 -0.977482
-0.311551 0.324371 -0.422867
1.24581 1.56688 0.0744785
julia> xv[3, 3] = 12.0
12.0
julia> x[3, 3, 1, 2]
12.0
You can make anything sized, I’ve even managed to make a GPU kernel in CUDA.jl broadcast with SizedArray
views of parts of CuArray
s.
3. Faster matrix multiplication
Sized views are, though, not all that was required to make that log!
method non-allocating. The other part is multiplication with Symmetric
matrices that allocates on StaticArrays 0.12.4 (it falls back to a generic LinearAlgebra
implementation). StaticArrays master has that improved and now you can freely wrap your arrays in Symmetric
, Hermitian
, Adjoint
, triangular matrix wrappers etc. StaticArrays 0.12.4 has special implementations only for Adjoint
, Transpose
and triangular matrices.
Thanks to @Elrod multiplication of statically sized matrices now also uses muladd
that makes it even faster.
4. Loading time reduction
The excellent work on reducing invalidations significantly impacted StaticArrays but now it turns out that loading time is dominated by insertion of new methods to existing functions, a different problem. For example on my computer slightly restricting signature of a 5-argument mul! method decreased loading time by 0.35s. These restricted cases never even worked on StaticArray and no one filed an issue about it.
Depending on how much of this structured matrix multiplication is removed, I get loading times on a recent Julia nightly between 0.4s and 0.8s. However, in contrast with other improvements mentioned here which are already merged to master, there is no consensus yet what to do about it.
As always, there are also other improvements to existing methods and new methods being implemented. The next major release of StaticArrays.jl will be quite cool .