RecursiveFactorization allows you to perform any factorization you’d like so long as it is lu
.
If you like LU, DifferentialEquations.jl, or recursion, you’ll love RecursiveFactorization.jl.
DifferentialEquations.jl uses it as the default method for solving matrices with dimension less than 500 x 500. This package will greatly boost the speed of implicit stiff differential equation solvers, as most of the runtime of implicit solvers is spent on the matrix factorization (the only O(n^3) operation in stiff solvers).
For the following plots, the reference is LinearAlgebra.generic_lufact!
(a triple loop LU factorization with partial pivoting), OpenBLAS
is invoked by LinearAlgebra.lu!
, and RF
denotes RecursiveFactorization
. “RF fully iterative” means the base case of the recursion, not to be confused with an iterative solver. “RF fully recursive” refers to the algorithm that uses the recursive method for all sizes greater than 10, but the base case calls the optimized triple loop LU for blocks smaller than 16x16. The threshold
is the cutoff point of using the recursive algorithm. (Sorry for the confusing labels!)
OpenBLAS 0.3.9
CPU: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
OpenBLAS 0.3.9
CPU: Intel(R) Core™ i9-10980XE CPU @ 3.00GHz
The first two examples have optimally tuned meta parameter.
OpenBLAS 0.3.9
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
This example has a badly tuned meta parameter. The default threshold value is 192 here, but one can see the optimal threshold value should be 80. You can override it via
RecursiveFactorization.RECURSION_THRESHOLD[] = 80
These plots were generated via:
using Pkg
Pkg.add(["RecursiveFactorization", "VegaLite", "DataFrames", "BenchmarkTools"])
using RecursiveFactorization
include(joinpath(pkgdir(RecursiveFactorization), "perf", "lu.jl"))
It will save the file in joinpath(homedir(), "Pictures", "lu_float64.png")
. You want the threshold to be where the fully recursive version passes the fully iterative version.
If you run this, please post the results and your CPU info here, as this will help us develop a better default algorithm!
Acknowledgment
This package uses @Elrod’s LoopVectorization.jl
.