Grassmann.jl A\b 3x faster than Julia's StaticArrays.jl

In this algebra, it’s possible to compute on a mesh of arbitrary 5 dimensional conformal geometric algebra simplices, which can be represented by a bundle of nested dyadic tensors.

julia> using Grassmann, StaticArrays; basis"+-+++"
(⟨+-+++⟩, v, v₁, v₂, v₃, v₄, v₅, v₁₂, v₁₃, v₁₄, v₁₅, v₂₃, v₂₄, v₂₅, v₃₄, v₃₅, v₄₅, v₁₂₃, v₁₂₄, v₁₂₅, v₁₃₄, v₁₃₅, v₁₄₅, v₂₃₄, v₂₃₅, v₂₄₅, v₃₄₅, v₁₂₃₄, v₁₂₃₅, v₁₂₄₅, v₁₃₄₅, v₂₃₄₅, v₁₂₃₄₅)

julia> value(rand(Chain{V,1,Chain{V,1}}))
5-element StaticArrays.SArray{Tuple{5},Chain{⟨+-+++⟩,1,𝕂,253} where 253 where 𝕂,1,5} with indices SOneTo(5):
   -0.33459594357756073v₁ - 0.3920064467082769v₂ - 0.575474920388841v₃ + 0.6150287650825268v₄ - 0.7568209093000915v₅
  -0.7402635950699139v₁ - 0.9303076362461833v₂ + 0.9729806462365271v₃ - 0.8514563480551867v₄ + 0.09906887873006287v₅
  -0.7456570397821101v₁ - 0.6497560949330929v₂ + 0.4756585550844967v₃ - 0.31169948016530347v₄ - 0.9355928499340793v₅
   -0.4555014543082292v₁ + 0.712268225360094v₂ - 0.7500443783398549v₃ - 0.36349628003234713v₄ + 0.5005769037801056v₅
 -0.07402971220645727v₁ + 0.19911765119918146v₂ - 0.4980618129231722v₃ - 0.7728564279829087v₄ + 0.9165735719353756v₅

julia> A = Chain{V,1}(rand(SMatrix{5,5}))
(0.9244801277294266v₁ + 0.029444337884018346v₂ + 0.745495522394158v₃ + 0.6695874677964055v₄ + 0.4998003712198389v₅)v₁ + (0.5423877012973404v₁ + 0.30112324458605655v₂ + 0.9530587650033631v₃ + 0.2706004745612134v₄ + 0.37762612797501616v₅)v₂ + (0.7730171467954035v₁ + 0.019660709510785912v₂ + 0.39119534821037494v₃ + 0.9403026278575068v₄ + 0.07545094732793833v₅)v₃ + (0.7184128110093908v₁ + 0.6295740775044767v₂ + 0.5179035493253021v₃ + 0.039081667453648716v₄ + 0.3719284661613145v₅)v₄ + (0.5033657705978616v₁ + 0.41183905359914386v₂ + 0.7761548051732969v₃ + 0.07635587137916744v₄ + 0.5582934197259402v₅)v₅

Additionally, in Grassmann.jl we prefer the nested usage of pure ChainBundle parametric types for large re-usable global cell geometries, from which local dyadics can be selected.

Programming the A\b method is straight forward with some Julia language metaprogramming and Grassmann.jl by first instantiating some Cramer symbols

Base.@pure function Grassmann.Cramer(N::Int)
    x,y = SVector{N}([Symbol(:x,i) for i ∈ 1:N]),SVector{N}([Symbol(:y,i) for i ∈ 1:N])
    xy = [:(($(x[1+i]),$(y[1+i])) = ($(x[i])∧t[$(1+i)],t[end-$i]∧$(y[i]))) for i ∈ 1:N-1]
    return x,y,xy
end

These are exterior product variants of the Cramer determinant symbols (N! times N-simplex hypervolumes), which can be combined to directly solve a linear system:

@generated function Base.:\(t::Chain{V,1,<:Chain{V,1}},v::Chain{V,1}) where V
    N = ndims(V)-1 # paste this into the REPL for faster eval
    x,y,xy = Grassmann.Cramer(N)
    mid = [:($(x[i])∧v∧$(y[end-i])) for i ∈ 1:N-1]
    out = Expr(:call,:SVector,:(v∧$(y[end])),mid...,:($(x[end])∧v))
    return Expr(:block,:((x1,y1)=(t[1],t[end])),xy...,
        :(Chain{V,1}(getindex.($(Expr(:call,:./,out,:(t[1]∧$(y[end])))),1))))
end

Which results in the following highly efficient @generated code for solving the linear system,

(x1, y1) = (t[1], t[end])
(x2, y2) = (x1 ∧ t[2], t[end - 1] ∧ y1)
(x3, y3) = (x2 ∧ t[3], t[end - 2] ∧ y2)
(x4, y4) = (x3 ∧ t[4], t[end - 3] ∧ y3)
Chain{V, 1}(getindex.(SVector(v ∧ y4, (x1 ∧ v) ∧ y3, (x2 ∧ v) ∧ y2, (x3 ∧ v) ∧ y1, x4 ∧ v) ./ (t[1] ∧ y4), 1))

Benchmarks with that algebra indicate a 3x faster performance than SMatrix for applying A\b to bundles of dyadic elements.

julia> @btime $(rand(SMatrix{5,5},10000)).\Ref($(SVector(1,2,3,4,5)));
  2.588 ms (29496 allocations: 1.44 MiB)

julia> @btime $(Chain{V,1}.(rand(SMatrix{5,5},10000))).\$(v1+2v2+3v3+4v4+5v5);
  808.631 μs (2 allocations: 390.70 KiB)

julia> @btime $(SMatrix(A))\$(SVector(1,2,3,4,5))
  150.663 ns (0 allocations: 0 bytes)
5-element SArray{Tuple{5},Float64,1,5} with indices SOneTo(5):
 -4.783720495603508
  6.034887114999602
  1.017847212237964
  6.379374861538397
 -4.158116538111051

julia> @btime $A\$(v1+2v2+3v3+4v4+5v5)
  72.405 ns (0 allocations: 0 bytes)
-4.783720495603519v₁ + 6.034887114999605v₂ + 1.017847212237964v₃ + 6.379374861538393v₄ - 4.1581165381110505v₅

Such a solution is not only more efficient than Julia’s StaticArrays.jl method for SMatrix, but is also useful to minimize allocations in Grassmann.jl finite element assembly.

Similarly, the Cramer symbols can also be manipulated to invert the linear system or for determining whether a point is within a simplex.

julia> using Grassmann; basis"3"
(⟨+++⟩, v, v₁, v₂, v₃, v₁₂, v₁₃, v₂₃, v₁₂₃)

julia> T = Chain{V,1}(Chain(v1),v1+v2,v1+v3)
(1v₁ + 0v₂ + 0v₃)v₁ + (1v₁ + 1v₂ + 0v₃)v₂ + (1v₁ + 0v₂ + 1v₃)v₃

julia> barycenter(T) ∈ T, (v1+v2+v3) ∈ T
(true, false)

There are multiple equivalent ways of computing the same results using the and : dyadic products.

julia> T\barycenter(T) == inv(T)⋅barycenter(T)
true

julia> sqrt(T:T) == norm(SMatrix(T))
true

The following Makie.jl streamplot was generated with the Grassmann.Cramer method and interpolated from Nedelec edges of a Maxwell finite element solution.

More info about these examples is at https://grassmann.crucialflow.com/dev/tutorials/dyadic-tensors

Hermann Grassmann was the inventor of linear algebra as we know it today.

41 Likes

That’s not type-stable. x = [@SMatrix randn(5,5) for i in 1:10000] is a type stable way to compute an array of random SMatrices, and that nearly doubles the speed of the computation and removes the allocations down to 2, i.e. from:

2.549 ms (29496 allocations: 1.44 MiB)

to

1.469 ms (2 allocations: 390.70 KiB)

for me. Still a nice performance for Grassmann.jl, but seems to be <2x.

12 Likes

That’s a beautiful plot you’ve got there!

Something problematic is going on here with those allocations. Digging into it a bit, it’s actually nothing to do with \, but rather a surprising type instability in the use of rand with the abstract type SMatrix{5,5}.

julia> typeof(rand(SMatrix{5,5},10000))
Array{SArray{Tuple{5,5},T,2,L} where L where T,1}

This is not a good user experience, so I opened an issue: https://github.com/JuliaArrays/StaticArrays.jl/issues/800


We previously used Cramer’s rule for inversion of tiny matrices (4x4 IIRC) but calculating the determinants naively can be extremely numerically unstable. So for a realistic comparison, you should also compare numerical accuracy in addition to speed.

For 5x5 solve we now use a static LU decomposition which I think is a good default. Having said that, it appears there’s ways to make Cramer’s rule numerically well-behaved and also asymptotically \mathcal{O}(N^3) (see https://www.sciencedirect.com/science/article/pii/S1570866711000736#br0120). This is surprising and interesting, as I thought Cramer’s rule had a reputation of just being a really bad idea in practice; even for tiny matrices, calculating determinants accurately is fraught with cancellation error if not done with care.

11 Likes

I’m also a little wary of constant folding (part of) the computation, unless solving with a constant right hand side is a realistic workload. In the single-matrix benchmark, the whole computation could be constant-folded if the compiler were smart enough. Not trying to shoot you down, just trying to be thorough.

What version of Grassmann does this require? On 0.5.12 (latest released version), Julia 1.4.2, I get allocations:

julia> using Grassmann, StaticArrays, BenchmarkTools; basis"+-+++";

julia> A = Chain{V,1}(rand(SMatrix{5,5}));

julia> @btime $A\$(v1+2v2+3v3+4v4+5v5);
  166.940 ns (4 allocations: 160 bytes)
3 Likes

It’s definitely an interesting paper, but I don’t see the advantage it shows over LU. It appears to be 2x slower and not more stable. Am I missing something?

Except that it appears to be faster for small matrices.

What you’re missing (if anything) is only that I’m not claiming it’s better or worse in terms of performance for small matrices :slight_smile:, just that it’s interesting that Cramer’s rule is not a total loss numerically.

The actual speed of small matrix kernels depends so much on the microarchitectural detail of your CPU (ability to use SIMD, detail of data dependencies and processor pipelining etc etc) that I’d not hazard a guess one way or another about performance without looking rather closely and benchmarking.

4 Likes

Base.@pure function Grassmann.Cramer(N::Int)

I hate to always be that guy, but the xy return value allocation is definitely not pure and that can introduce issues into your program. Try to avoid ever marking functions as @pure as that can introduce serious issues if you’re wrong—and generally should get marked automatically if you’re correct. (Notably, we also now immediately reject any bug reports that include Grassmann.jl in the dependency list due to this abuse)

12 Likes

So, there is a strange thing happening where if you only run using Grassmann the Base.:\ method is slower than if you additionally paste the Base.:\ method into the REPL after loading the package.

# only `using Grassmann` without pasting `Base.:\` code into REPL
julia> @btime $A\$(v1+2v2+3v3+4v4+5v5)
  181.196 ns (4 allocations: 160 bytes)

# after pasting the `Base.:\` code into the REPL the performance is faster
julia> @btime $A\$(v1+2v2+3v3+4v4+5v5)
  72.708 ns (0 allocations: 0 bytes)

This is the reason why you get a slightly different result in your post.

It’s not clear to me why that might be the case, does anyone know anything about that?

Note that this algorithm is NOT Cramer’s rule. The symbols are analogous to Cramer determinants, but they are in the language of affine Grassmann algebra and it’s the original Grassmann algorithm. So it’s not exactly the same computation as Cramer’s rule, although analogous in terms of algebra.

Can you elaborate what the potential issues are? If you can’t provide any examples or justification, then I am not necessarily inclined to believe your claim. The main issue I have been told about is that the internal definition of @pure might have breaking changes in the future, but that’s not an issue yet.

I am not aware of any bug reports for Grassmann.jl outside of my repositories (any references?).

This would be one of them as an example: https://github.com/JuliaLang/julia/issues/31185

3 Likes

If you don’t understand what the annotation means, then you should probably not be using it at all. I don’t really care what you believe—I implemented @pure, so I’m generally rather familiar with how it can go wrong. I created it as a crutch for the compiler internally, and as we’ve improved the compiler, we’ve been attempting to remove the usage of it as generally it can only causes problems even when applied correctly.

7 Likes

Yes, I’m aware you implemented it, but you’ve not yet convinced me to not use it here. This Grassmann program is my playground for experiments with mathematics and programming. It works fine for me, and I am always willing to consider changing its application in any use of Base.@pure if it’s a mistake, but am not currently convinced it is a demonstrable issue here.

willing to consider changing its application in any use of Base.@pure if it’s a mistake

You’re clearly not actually willing to, since it’s been pointed out before that the bug report mentioned above (https://github.com/JuliaLang/julia/issues/31185) demonstrates that you are using it incorrectly and that it causes issues.

1 Like

That bug was fixed already, and the fix didn’t involve changing anything with @pure as it was a bug with the type system in a conversion method.

I’m only guessing, but I think that the reason why "return value allocation is definitely not pure" is that it depends on where heap memory is available, and this is is a global state.

Example:

f() = []
f() === f() # returns false

The function f is not pure, since calling it with the same (empty) set of immutable arguments does not return the same result.

1 Like

This doesn’t actually matter for my use case though. I don’t need the output to be ===. It only generates some symbols which get pasted into a function.

The @pure annotation tells the compiler that you guarantee the output to be ===. If you don’t have or need that guarantee, then you should let compiler know by removing the annotation.

Alright then, I will remove that annotation in this particular case, but there many valid use cases of it in other places of my package.

1 Like

Julia has gotten significantly better at constant propagation since you first wrote this, so I would be surprised if many of them actually give you a performance benefit anymore in any case.

5 Likes

At the risk of sounding like a breaking record, this is probably not true, mostly just because there isn’t really a way to use @pure correctly, ever. It doesn’t have well defined semantics. At some point we need a version of it that’s well-defined. The point people are trying to make here is that it’s best to avoid @pure because the compiler makes zero guarantees about what it means, so if you don’t 100% need it, you’re better off without it. Even if nothing breaks right now, future versions of Julia can and will break and crash your code. You might not care, but that’s the reason people try to get you to stop using it.

10 Likes