While the previous FlexUnits.jl focused on performance increases for static inference to match the performance of Unitful.jl, this release focuses on mixed-unit linear algebra, something that Unitful and many other units packages particularly struggle with. With this update, one can easily solve differential equations with units, and even use implicit methods like Rodas5P (see this tutorial). This release enables efficient mixed-unit linear algebra using two major techniques:
-
Defining an “unknown” dimension sentinel value by setting all dimensional powers to the number’s typemax (for the default FixRat32, this value is 2147483647//25200, a highly unusual value). This essentially grants a single “free pass” for unit validation, which is very useful for initialization; enabling some common patterns like indexing sparse/diagonal arrays and array initialization. This allows methods like
covandmeanto work for mixed units, where other packages fail. -
Defining a
LinmapQuantobject that separates the numerical values from unit structure in separate objects. ALinmapQuantis a special type of quantity-matrix that is compatible with linear algebra operations (i.e. a linear mapping, not all quantity-matrices can be multiplied). This is required to make certain matrix operations work (such asinvandexp) as many of these functions contain comparison operations that fail with quantities, but can also significantly boost performance for linear algebra with mixed units.
The secret to fast mixed-unit linear algebra in FlexUnits.jl are the UnitMap and DimsMap objects. Both work by stating an object transforms a vector with units u_in to a vector with units u_out; the difference is that a DimsMap is matrix-like and only supports dimensional units (like SO) where UnitMap is more general, supporting generic units. Linear algebra operations on a LinmapQuant applies the same operation on its values (a raw numerical matrix where plenty of optimized methods already exist) and its DimsMap which FlexUnits defines shortcut operations for. Typically, operations on DimsMap are far simpler than the numerical matrices: a matrix multiplication that is O(n^3) on the raw matrices can be O(n) on DimsMap.
In addition to being faster with matrix multiplication, LinmapQuant supports more mixed-unit operations than existing alternative packages:
\,/,inv,exp,loglu,eigen,cholesky
FlexUnits was also tested for the following operations (currently unoptimized)
mean,std,cov
Benchmarks
All three packages (FlexUnits, DynamicQuantities, and Unitful) can do matrix multiplication, so we can compare benchmarks directly for them with this particular functionality.
using FlexUnits
using .UnitRegistry
import DynamicQuantities
import Unitful
using BenchmarkTools
#Use unitless matrices as a benchmark
Nr = 200
X = randn(Nr, 4)
M = rand(4,4)
#Construct unitful matrices
uu = [Unitful.u"kg/s", Unitful.u"kW", Unitful.u"rad/s", Unitful.u"N/m"]
ut = reshape(uu, 1, :)
Xu = X.*ut
Mu = inv.(uu) .* M .* inv.(ut)
#Construct DynamicQuantity matrices
udq = [DynamicQuantities.u"kg/s", DynamicQuantities.u"kW", DynamicQuantities.u"rad/s", DynamicQuantities.u"N/m"]
udqt = reshape(udq, 1, :)
Xdq = X.*udqt
Mdq = inv.(udq) .* M .* inv.(udqt)
#Construct LinmapQuant matrices
ufq = [UnitRegistry.u"kg/s", UnitRegistry.u"kW", UnitRegistry.u"rad/s", UnitRegistry.u"N/m"]
Xfq = X * UnitMap(u_out = UnitRegistry.u"", u_in = inv.(ufq))
Mfq = M * UnitMap(u_out = inv.(ufq), u_in=ufq)
julia> @btime X*M #No units
700.000 ns (3 allocations: 6.35 KiB)
julia> @btime Xu*Mu #Unitful, more than 500x slower
395.700 μs (5603 allocations: 93.83 KiB)
julia> @btime Xdq*Mdq #DynamicQuantities, about 8x slower
5.700 μs (3 allocations: 31.34 KiB)
julia> @btime Xfq*Mfq #LinmapQuant, almsot no overhead
710.000 ns (4 allocations: 6.41 KiB)
The main reason why FlexUnits.jl has nearly no overhead is that only the inner product of the units between matrices is considered. Only the first 4-element row of X and the first column of M need to be compared. Unit inference does not touch the other 199 rows of X or the other 3 colums of M.
Another example is linear regression.
Nr = 200
XY = randn(Nr, 6) * rand(6, 6)
X = [XY[:, begin:4] ones(Nr)]
Y = XY[:, 5:end]
Xu = LinmapQuant(X, UnitMap(u_out=UnitRegistry.u"", u_in=inv.([UnitRegistry.u"kg/s", UnitRegistry.u"kW", UnitRegistry.u"rad/s", UnitRegistry.u"N/m", UnitRegistry.u""])))
Yu = LinmapQuant(Y, UnitMap(u_out=UnitRegistry.u"", u_in=inv.([UnitRegistry.u"K", UnitRegistry.u"kPa"])))
julia> @btime (X'X)\(X'Y) #No units
4.880 μs (12 allocations: 992 bytes)
julia> @btime Bu = (Xu'Xu)\(Xu'Yu) #LinmapQuant, about 1.3x slower
6.400 μs (19 allocations: 1.66 KiB)
This time, the overhead from unit inference was noticeable, but still much less than 2x. The reason for this is because Xu'*Xu is a long multiplication that compares the 200 columns of Xu' vs the 200 rows of Xu, with the same thing happening again in Xu'*Yu. Thankfully, this 200-row comparison is required only once for the 25 combinations of Xu'*Xu and once for the 10 combinations of Xu'*Yju. In general, dynamic unit inference is about 6x to 8x slower than floating-point operations, so having two unit inferences for every 35 floating-point operations gives a slowdown factor of (35+2*7)/35 = 1.4, almost exactly the slowdown that was seen in the benchmarks.
This is the last major milestone for this project, and the API can now be considered stable. Foreseeable future work mainly consists of bugfixes, optimizations and expanding support for more operations.