[NEW] NautyGraphs.jl

NautyGraphs.jl

I am excited to announce my first package NautyGraphs.jl, which is now available through the general registry.

NautyGraphs is a Julia interface to the famous graph isomorphism library nauty by Brendan McKay. NautyGraphs can be used to perform efficient isomorphism checking, canonical labeling, and hashing of vertex-labeled graphs.

Moreover, NautyGraphs is fully compatible with the Graphs.jl API. This makes it easy to create and modify NautyGraphs through familiar syntax, convert between the graph formats, or use NautyGraphs with a large library of existing graph algorithms. NautyGraphs.jl has been partly inspired by the (now dormant) Nauty.jl package.

How it Works

NautyGraphs exports two main types: NautyGraph and NautyDiGraph. These graph types can be created, accessed, and modified through the Graphs.jl API, but they represent their underlying graphs in bit array form, the format required by nauty. Therefore, a NautyGraph can be passed directly to nauty, without any conversion overhead. Through nauty’s canonical labeling procedure, NautyGraphs can be hashed, which makes it possible to efficiently filter large collections of graphs for isomorphism.

There are two main methods exported by NautyGraphs:

  • is_isomorphic(g1, g2) checks two NautyGraphs for isomorphism
  • ghash(g) hashes the canonical version of a graph, so that (up to hash collisions) ghash(g1) == ghash(g2) is true if and only if is_isomorphic(g1, g2).

It is also possible to use nauty(g) directly to get more information about a graph’s automorphism group.

For more information, please check out NautyGraphs on GitHub.

Rough Edges

NautyGraphs’ biggest open issue is that it is currently not available on Windows. This should hopefully be fixed someday by making Windows binaries for nauty available through nauty_jll.

There are a number of other minor issues or missing features, from detailed documentation, supporting nauty’s sparse graphs format, handling of automorphism group information, or simply improving the performance of basic graph methods. I welcome any raised issues or pull requests!

Alternatives

NautyGraphs is designed to be a lightweight interface to nauty that works with Graphs.jl. There are other ways to access nauty, or perform graph isomorphism checks from within Julia. Some examples I know of:

  • Nauty.jl: An older wrapper for nauty. Has baked-in options for nauty to increase performance. Can convert LightGraphs to nauty format.
  • VF2 algorithm in Graphs.Experimental. Can perform graph isomorphism, subgraph matching and more. No graph hashing (as far as I know). Performs better than nauty on some graphs, but worse on others. NautyGraphs.jl comes with a benchmark file to compare against VF2. NautyGraphs usually performs much better on larger graphs, while VF2 is faster on small and sparse graphs. Additional optimizations and support for nauty’s sparse graph format should lead to even better performance in the future.

Since this is my first Julia package, I’d be very grateful for any comments or feedback!

21 Likes

On the alternatives side there’s also a recently registered GitHub - laurentbartholdi/NautyTraces.jl: A simple interface to the nauty/trace code to test graph isomorphism.

1 Like

Thanks for bringing this up, I did not know about this. I’ll include a link to NautyTraces.jl on github!

I may have to go with this option in projects, since it looks like the only one supporting graphs with both vertex colors and edge colors.

1 Like

There is a way to map edge-colored graphs onto vertex-colored graphs such that isomorphism relations are preserved, as explained on p.62 of the nauty user guide. If there is demand, I could implement this mapping in NautyGraphs!

1 Like

Nice!

I think you should do better, or at least warn accordingly. The issue is: your hash uses the julia array-hash. Quoting from julia> @less hash([1,2,3,4], UInt(0)):

    # Goal: Hash approximately log(N) entries with a higher density of hashed elements
    # weighted towards the end and special consideration for repeated values. Colliding
    # hashes will often subsequently be compared by equality -- and equality between arrays
    # works elementwise forwards and is short-circuiting. This means that a collision
    # between arrays that differ by elements at the beginning is cheaper than one where the
    # difference is towards the end. Furthermore, choosing `log(N)` arbitrary entries from a
    # sparse array will likely only choose the same element repeatedly (zero in this case).

So, naively one would read your docs as “absent adversarial constructions, you get a 1 in 2^64 chance of spurious collisions”. This is not true! For largish digraphs, you afaiu have a good chance that changing a single edge will leave the hash unchanged.

Since nauty is expensive and hashing is cheap, I’d recommend to instead compute a crypto-hash of the relevant datastructures and return the leading UInt128. SHA is part of stdlib, so should be available everywhere.

Or, if strong hashes are too big of a deal, at least make sure to incorporate the entire info into the hash (i.e. keep the julia hash functions, but use a custom construction that iterates over the array).

1 Like

Thanks a lot for pointing this out. I knew that using a stronger hash would probably be good, but I didn’t realize hash collisions were this likely on large graphs. This will be fixed soon.