Singletons only compare well to enums if you know what the singletons will be at compile time. In the case of your running product, you’re generating them at runtime, so you end up hitting a dynamic dispatch for each multiplication, killing performance.
For 16 element group I guess nothing will beat cached multiplication table (about 2Kb
For computations with finite groups the usual way is to provide some kind of normal form (to uniquely identify group element), but since usually groups are very large, nobody stores the list of group elements not to mention the multiplication table.
I did store those in GroupRings.jl, since I needed to do some algebraic geometry in a finite subspace of the group ring of a infinite group. So this subspace needs a basis, and the algebra structure needs multiplication. But I’d consider this solution something atypical.
Yes, you definitely don’t want to unroll a very large loop. I should have been more clear that this approach has exactly the same benefits and drawbacks as StaticArrays.jl. For high dimensional Clifford algebras, your sparse approach is the only reasonable way to go (though I’d still say you want to use @generated
functions instead of Base.@pure
, just not for loop unrolling).
Here’s an example with a small dimensional algebra (the only domain I’ve tried to target) with my very janky WIP repo GitHub - MasonProtter/GeometricAlgebras.jl
using GeometricAlgebras
s, γ1, γ2, γ3 = basis_vectors(3)
A = s + γ1 + 2γ2 - γ3
B = γ1*γ2 - 3s
#--------------------------------
using Grassmann
@basis V"+++"
C = 1.0 + v1 + 2.0v2 - v3
D = 1.0v12 - 3.0
#--------------------------------
f(A, B) = A*(A*A + B*B)*B
julia> @btime f($A, $B)
98.256 ns (0 allocations: 0 bytes)
-75.0 + -115.0γ₁ + -55.0γ₂ + 45.0γ₃ + 45.0γ₁γ₂ + -35.0γ₁γ₂γ₃
julia> @btime f($C, $D)
4.356 μs (30 allocations: 2.20 KiB)
-75.0 - 115.0v₁ - 55.0v₂ + 45.0v₃ + 45.0v₁₂ - 35.0v₁₂₃
here you can see the loop unrolling gives over an order of magnitude speedup. The results are practically the same if we don’t allow the function to see the inputs as constants:
julia> @btime f($(Ref(A))[], $(Ref(B))[])
100.242 ns (0 allocations: 0 bytes)
-75.0 + -115.0γ₁ + -55.0γ₂ + 45.0γ₃ + 45.0γ₁γ₂ + -35.0γ₁γ₂γ₃
julia> @btime f($(Ref(C))[], $(Ref(D))[])
4.294 μs (30 allocations: 2.20 KiB)
-75.0 - 115.0v₁ - 55.0v₂ + 45.0v₃ + 45.0v₁₂ - 35.0v₁₂₃
I think you should be able to somewhat easily get the best of both worlds here by just defining a multiplication method for D
below some cutoff which unrolls the loop and then when you get up to higher dimensions you can use your current implementation.
Another thing to keep in mind is that my implementation of multiplication there is incredibly inefficient at compile time and definitely needs to be optimized if I want to be able to do operations at dimensions comparable to the upper size limit of StaticArrays.jl.
I will consider making a hyprid implementation for the lower dimensions… it occured to me before but isn’t a high priority goal yet. My primary goals at first are to get the functionality I want, and optimize later. There are still countless optimizations I haven’t done yet.
The way to make precompilation faster is by not using generated functions, but that’s a different topic.
I’m not telling you to do that, I’m just saying that not unrolling the loop absolutely leave performance on the table in low dimensions.
It’s not precompilation that’s the problem for me. It’s compilation, and the fact that my current algorithm for finding the multiplication table at compile time is incredibly wasteful and inefficient. The generate function has little to do with it. If I abused Base.@pure
, the code would still have to run at some point. The actual mechanism for hoisting the calculation should give a negligible contribution here (if there even is a difference).
@Mason I have made another benchmark of your code for 6 dimensional algebras:
using GeometricAlgebras
s, γ1, γ2, γ3 = basis_vectors(6)
A = s + γ1 + 2γ2 - γ3
B = γ1*γ2 - 3s
using Grassmann; basis"6"
C = 1.0 + v1 + 2.0v2 - v3
D = 1.0v12 - 3.0
f(A, B) = A*(A*A + B*B)*B
with the results
julia> @btime f($A, $B)
401.801 μs (16640 allocations: 380.00 KiB)
-75.0 + -115.0γ₁ + -55.0γ₂ + 45.0γ₃ + 45.0γ₁γ₂ + -35.0γ₁γ₂γ₃
julia> @btime f($C, $D)
5.591 μs (30 allocations: 4.92 KiB)
-75.0 - 115.0v₁ - 55.0v₂ + 45.0v₃ + 45.0v₁₂ - 35.0v₁₂₃
As you can see, the performance gain is only valid for the 5 dimensional algebras or less. Since 5D algebras are fairly common, I will create a hybrid @generated
arithmetic for Grassmann
package.
However, the @generated
usage will not replace the usage of @pure
in my package, its usage will be specific to the algebraic operations for 5D or less and not the utility support code. This will require a fairly significant overhaul of the algebra code.
As you can see, although my for loop technique is slower for the 5 or less dimensional case, it has an extreme performance gain compared to your @generated
technique for 6 dimensions or higher.
Alright, now I am much closer to the timings of @Mason after making some changes:
julia> @btime C*D
46.674 ns (1 allocation: 80 bytes)
-3.0 - 5.0v₁ - 5.0v₂ + 3.0v₃ + 1.0v₁₂ - 1.0v₁₂₃
julia> @btime f($C,$D)
192.717 ns (5 allocations: 304 bytes)
-75.0 - 115.0v₁ - 55.0v₂ + 45.0v₃ + 45.0v₁₂ - 35.0v₁₂₃
Thanks a lot to @Mason for the suggestions, this feature will be added to the other products as well.