Encoding group elements and operations

jlapeyre · October 22, 2019, 9:28pm

Any ideas on efficient/convenient ways to encode group elements and operations ? In particular, I am interested in reducing strings of elements in a finite group by the group operation. But, I want to use them in other ways, eg get a matrix representation of an element.

For example, Cliffords.jl includes the N-qubit pauli groups, which are the groups generated by {X, Y, Z}^⊗N, where X, Y, Z are Pauli operators. It seems to a be an efficient encoding.

The most performant encoding I have found is an n x n Matrix{Int} (or UInt8 etc.) representing the multiplication table. On the other hand, encoding elements as singleton types increases the time for string reduction by a factor of 10. (Note that these don’t appear to be practical for the N-qubit Pauli groups for arbitrary N. I can’t think of a better encoding than that used in Cliffords.jl.)

Here is an example of these two encodings for the cyclic group of order 3:

# Investigate efficiency of reduction of strings of group elements by the group operation.
# We compare two implementations of the cyclic group of order three:
# 1. group elements are encoded as singleton types. Multiplication table is
#    encoded as methods.
# 2. elements are encoded as integers. Multiplication table is
#    encoded as a Matrix of integers.
#
# The result: integer encoding is roughly 10x faster.

using BenchmarkTools
using StaticArrays

# Singleton types encoding cyclic group of order 3
struct At end
struct Bt end
struct Idt end
const A = At()
const B = Bt()
const Id = Idt()

const group_elements = (A, B, Id)
get_group_element(i::Integer) = group_elements[i]

groupop(::At, ::At) = B
groupop(::At, ::Bt) = Id
groupop(::Bt, ::Bt) = A
groupop(::Bt, ::At) = Id
groupop(::Idt, ::Idt) = Id

groupop(::Bt, ::Idt) = B
groupop(::Idt, ::Bt) = B
groupop(::At, ::Idt) = A
groupop(::Idt, ::At) = A

# Encode cyclic group of order 3 as integers
# Using UInt8, etc. changes speed by about 10%
const _mult_table = zeros(Int, 3, 3)
const (a, b, id) = (1, 2, 3)

for (x, y, z) in ((a, a, b), (a, b, id), (b, b, a),
                  (b, a, id), (id, id, id), (b, id, b),
                  (id, b, b), (a, id, a), (id, a, a))
    _mult_table[x, y] = z
end

const mult_table = SMatrix{3,3}(_mult_table)

groupop(x::Integer, y::Integer) = mult_table[x, y]

# Compare reduction of string of elements
function reduction_timing(num_elements::Integer)
    a_int = rand(1:3, num_elements)
    a_type = get_group_element.(a_int)
    res_int = @btime reduce(groupop, $a_int)
    res_type = @btime reduce(groupop, $a_type)
    get_group_element(res_int) == res_type || error("reductions do not agree.")
    return nothing
end

Here is a typical result:

julia> reduction_timing(10)
  15.313 ns (0 allocations: 0 bytes)
  213.471 ns (0 allocations: 0 bytes)

abulak · October 22, 2019, 10:17pm

in GroupRings.jl I went for multiplication/division table, for the very purpose that I compute sum of squares, so the same group operations would be repeated multiple times. (A more recent version you can find in NCRing_interface branch).

However the actual answer depends on the actual group you’re computing with: how large will it be in practice? If You’re interested only in specific Pauli groups, then probably a custom solution will be better (though I don’t understand what is {X, Y, Z}^⊗N? direct sum of N-copies of Pauli group C₄×D₄?).

Btw. in one of my experiments dealing with small matrix groups a custom type was the fastest since it allows multiplication in a few (one ;)) simd operations, e.g. https://github.com/kalmarek/RamanujanGraphs.jl/blob/master/src/gl2.jl#L77

jlapeyre · October 22, 2019, 11:17pm

For the moment I am interested in just the Pauli group C₄×D₄. This is only 16 elements, so a variety of solutions should be feasible. In a matrix representation, the Pauli group G_N is generated by all N-factor kronecker products with factors chosen from \{X, Y, Z\}. For N=2, this is \{X\otimes X, X\otimes Y, \ldots\}.

for the very purpose that I compute sum of squares, so the same group operations would be repeated multiple times.

It may be that there is no obvious best implementation of a finite group; ie it depends on what you want to do with it, how you want to label the elements, efficiency vs. flexiblity, etc.

I did not see your packages. I did notice that AbstractAlgebra.jl and Nemo. jl don’t have much support for groups out of the box.

chakravala · October 22, 2019, 11:39pm

The goal of my package is to make a more efficient representation of Clifford algebras without matrices.

it supports all kinds of generalized tensor algebra and group operations…

abulak · October 22, 2019, 11:48pm

ok, so tensor product does not happen in the category of groups but I guess the tensor just represent the “canonical form”.

If you just want to compute with matrices, I’d go for directly representing X, Y, Z as tuples (SMatrices?) and then an element of n-fold product as a struct with N-tuple of such things + maybe phase (depending how expensive it is to compute it on the fly);

If you want to interact with group elements (words in generators), then, well You need to keep them in such form and use multiplication table (for the single factor).

To say anything more, we’d need to see actual use-case (and make sure it’s the actual bottleneck

I did not see your packages.

yes, they aren’t published. For group theory you should probably use GAP.jl or gapjm.jl, but ymmv

chakravala · October 23, 2019, 12:02am

there are plenty of bivector groups and other groups that are subsets the TensorAlgebra I am using… including the ones being talked about here, except I don’t use matrices… I am using multivectors.

jlapeyre · October 23, 2019, 12:26am

Cliffords.jl represents the Pauli group and the Clifford group without using matrices.

chakravala · October 23, 2019, 12:32am

What is the representation in Cliffords.jl based on then?

Is there a performance comparison with Grassmann.jl?

jlapeyre · October 23, 2019, 12:38am

Pauli
https://github.com/BBN-Q/Cliffords.jl/blob/621efc7b4462f5d47ca5a80fc539be584218cd42/src/Paulis.jl#L19-L23

Clifford
https://github.com/BBN-Q/Cliffords.jl/blob/621efc7b4462f5d47ca5a80fc539be584218cd42/src/Cliffords.jl#L17-L20
(The struct could be immutable with no other changes to the code)

jlapeyre · October 23, 2019, 12:38am

I don’t think so. But, that would be interesting.

abulak · October 23, 2019, 7:41am

@jlapeyre for such small union vectors: v::SVector{N,UInt8} # 0 = I, 1 = X, 2 = Z, 3 = Y I went for separate types:
https://github.com/kalmarek/Groups.jl/blob/572e7e046f3ff38527e997e4c26fffffbd9a89c5/src/AutGroup.jl#L27

However, I had no chance of making my computations simdable (and that is not the bottleneck anyway).
EDIT: maybe the most important point: elements of the above group are distinguished by their action on a very specific graph, so encoding the action in type was very handy!

In Cliffords,jl they do what I had in mind Is it too slow for your purposes? This
https://github.com/BBN-Q/Cliffords.jl/blob/621efc7b4462f5d47ca5a80fc539be584218cd42/src/Paulis.jl#L129
looks pretty well optimized for me

@chakravala I meant that in the category of groups there is no tensor product operation (or is there one?) Looking at the formula above it seems that the group is C₄ ⋊ ⊕_n D₄ (a semidirect product of direct product of D₄s by C₄)

chakravala · October 23, 2019, 8:13am

Of course you can still do direct products, which are not quite the same thing as tensor products… groups are groups… I thought of making an extra feature for it in the past, but had a million other ideas I cared about first. Certainly you can do all that if you wanted to…

this might help you clarify things

abulak · October 23, 2019, 8:58am

@chakravala thanks for the explanation, but only one of these operations lives in the category of groups A standard tensor product requires an R-modules (commutative ring R). Anyway, that’s really off-topic.

chakravala · October 23, 2019, 9:17am

Not sure what your point is… the group elements can be constructed with tensor products of generators… after you construct your group, then you can still take direct products of those groups. You seem to be confusing and mixing together a lot of things… nowhere did I ever say or imply that tensor products are in the category of groups and it seems like you are imagining this.

abulak · October 23, 2019, 1:44pm

I was confused what “tensor product of generators” is supposed to mean (e.g. how do you tensor two generators of a finitely presented group? - this is a rhetorical question :), as there is no tensor product (see above)… [compare with group generated by direct products of generators, which is a well defined subgroup in the direct sum of parent groups)]. Anyway, that’s really off-topic ;-), the group is a semi-direct product and hence has a (efficient to compute) normal form

chakravala · October 23, 2019, 4:14pm

You seem to be very confused here. As I said from the start, taking a direct product is something different from the tensor product. You can take direct products of any groups… anyway this is getting boring.

Mason · October 23, 2019, 5:24pm

Hi @jlapeyre, I think the problem with your singleton implementation is that you’re hitting dynamic dispatch, which is very expensive. If you’re only interested in groups that only have multiplication, not addition then your look-up table approach is probably best, (though it’ll be faster if you can enforce bounds checking. You also might want to replace your integers with Enums for stylistic reasons.

If you’re interested in algebraic structures that also have addition, such Clifford algebras, things get a bit more complicated. What we really want, is a sort of hybrid approach where we hoist as much of the computation to compile time as possible without hitting dynamic dispatch. @chakravala achieves this sort of hoisting in his Grassmann.jl package by abusing Base.@pure, but I’ve found it’s possible to get great performance using generated functions as well.

For example, if you imagine a real clifford algebra in 2D, there are 4 elements, id, γ1, γ2, γ1γ2. If you represent some general multivector a*id + b*γ1 + c*γ2 + d*γ1γ2 as a tuple (a, b, c, d) then you can generate an implicit multiplication table inside a @generated function such that the multiplication

(a*id + b*γ1 + c*γ2 + d*γ1γ2) * (e*id + f*γ1 + g*γ2 + h*γ1γ2)

lowers to just

((a*e + b*f + c*g - d*h), 
 (a*f + b*e - c*h + d*g),
 (a*g + e*c + b*h - d*f),
 (a*h + d*e + b*g - c*f))

That is, you want to unroll the loop over the multiplication table at compile time, so that the only thing that happens at runtime is the multiplication of coefficients, with no looping. If you’d like, I can give an example of doing this in a generated function.

jlapeyre · October 23, 2019, 9:12pm

Yes, I think it is well optimized. But, I was wondering if one could be more efficient for P_1, ie just 16 elements. And more generally, how have people implemented operations with finite groups. I have a specific use case, but there may be others. So, what’s good enough now might not be later. For quantum information tasks, Pauli operators are re-implemented again and again. I wonder if its reasonable to ask for an implementation that can be used in several contexts.

jlapeyre · October 23, 2019, 9:27pm

I am actually using enums. But, they can’t be subtypes of an abstract type, which would be very useful for me. But, they are better than plain integers.

Dynamic dispatch is probably happening. I’m not sure if its part of the semantics. But unless optimized away somehow, dynamic dispatch will be used. Also, the running product is not type stable by definition.

It’s interesting that singleton types are used as an example of zero cost abstraction. In particular that they may compare well to enums. But, not in this case, apparently.

chakravala · October 23, 2019, 10:23pm

I disagree on this approach. This might work for very low dimensions, but when you are working with a high dimensional multivector, it’s gonna generate a huge piece of compiled code. In Grassmann.jl, it handles sparse multivector computations in high dimensions much faster because the loop and code size is much smaller than having a huge algebraic expression with thousands or millions of operations.

The performance with the loop doesn’t seem to be an issue at all for me.

Sure, give us some examples, I’m always down for that.

Topic		Replies	Views
What is the most efficient binary representation of unsigned integers in Julia? New to Julia question , bitstring	17	3148	March 22, 2021
ANN: BitBasis.jl: Types and operations for basis marked by bits in Linear Algebra Package Announcements announcement	4	795	March 19, 2019
Removing string operations on vectors (mentioned in #19449) Internals & Design	2	658	December 24, 2016
Xor performance Performance question	3	2477	October 18, 2019
Permutation of elements in array + composition of functions resulting into an extremely inefficient runtime? Performance question	6	592	November 28, 2020

Encoding group elements and operations

Related topics