[ANN] KangarooTwelve.jl — The fastest cryptographic hash in Julia

For a while now I’ve been dissatisfied with the hashing situation in Julia, with the need to compromise between:

  • Fast and laughably non-cryptographic (CRC32c)
  • Cryptographic, but slow (SHA2/SHA3)

This has led me to go “screw it, I’ll implement a fast cryptographic hash myself”. It’s been a bit of an odyssey tracking down some of the trickier performance pitfalls currently in Julia (e.g. the big effect function inlining can have on heap allocations), and we’re still lagging C/Rust performance by a factor of 2-3x, however, I’ve finally reached the point that I’m happy enough with the implementation to start using it!

This beats out everything but CRC32c by some margin, across all input sizes. :tada:

The single-threaded version is entirely stack-allocated, and the multithreaded currently performs a fixed number of heap allocations (around 90).

The code base is also relatively small (~600 lines excluding docstrings), and I like to think fairly readable. The only dependencies are Mmap (stdlib), SIMD, and PrecompileTools.

KangarooTwelve also has some neat design aspects, such as support for domain separation via customisation strings, also being an XOF (eXtendible output function) via the use of a cryptographic sponge (I’m thinking it might be fun to make it subtype IO), and a binary-tree “vine” format to allow for SIMD and threading parallelisation (though I haven’t been able to get SIMD working well yet).

As of yesterday, KangarooTwelve is registered in General, and using it is as simple as:

julia> using KangarooTwelve

julia> k12("keccak") # or a Vector{<:Unsigned}, or IO
0x712cb0b00c8d635d8e4750b9678c8f31

Enjoy! :smiley:

21 Likes

Hash benchmarks site is missing k12 but otherwise informative

https://rurban.github.io/smhasher/doc/table.html

Er? Why is it called KangarooTwelve?
That is a strange name for a package.

1 Like

Because that’s the hashing algorithm: Keccak Team - KangarooTwelve

It’s twelve rounds of Keccak hopping along the data.

2 Likes

It’s a cool name. An icon almost draws itself:

25 Likes

Why not just call an existing Keccak library directly? For hashing, where you are operating on byte arrays, what is the advantage of re-implementing in Julia, given that it seems that dozens of implementations already exist, and it sounds like some of the C libraries are quite optimized?

OpenSSL and libgcrypt both have Keccak implementations, and both already have JLL binary packages. (Unfortunately, Keccak will only be released in OpenSSL 3.2, which is supposed to come out “soon”; our JLL currently packages OpenSSL 3.0.12. Libgcrypt has had Keccak since 2015, so I assume it is included in our JLL.) Or you could wrap another optimized C library.

2 Likes

One of the nice things about Keccak is that it’s rather simple to implement: indeed other than defining some constants it’s only ~11 lines of Julia, and I have been lead to the impression that when it comes to basic numerical-type operations Julia performs around C/Rust levels anyway (I am now much more aware of the difficulties in tracking down allocations and reinterpret overhead).

So, I thought given the simplicity of the scheme it might be nice to have a pure-Julia implementation. There are also a few other reasons, for example the “cryptographic sponge” concept is actually quite versatile (it can be used as a deterministic RNG for instance), and so I think it’s nice to have more than just a single C “hash this” entrypoint.


As an aside, the OpenSSL function you link doesn’t seem to be KangarooTwelve or the Keccak permutation/SHAKE, just SHA3-256 with a different delimiter byte.

4 Likes

Yes, if you compare it to a 15-line C implementation, a 15-line Julia implementation should be generally be comparable in performance (modulo the usual Julia performance tips about type stability, avoiding allocations, etcetera). But if you compare to a highly optimized C implementation that uses SIMD etcetera, then you will need to do similar work on the Julia side. e.g. libgcrypt’s keccak(?) implementation is well over a 1000 lines of code; you aren’t likely to rival something like that with a 15-line Julia implementation.

On the other hand, if you have a simple, compact Julia implementation that has reasonable performance, I agree that this is valuable to have for its own sake.

5 Likes

That ~11 line Julia implementation (I feel like I can fairly not count the comments), also supports SIMD (and does x4 pretty well on AVX512 machines) :grin:. Someone with AVX512 (I only have SSE3) was able to see TurboSHAKE128 go from ~1.5 GiB/s to ~6 GiB/s with SIMDx4. My SIMD headaches mainly lie in other parts of the code and how SIMD works with reinterpreted arrays (badly).

4 Likes

how does XxHash-3 compete with what you’ve shown here? Recently a library / fileformat opitmized for speed has switched from CRC32c to XxHash-3 and I wonder if this is why

XXHash should be faster, but for my usage KangarooTwelve should be more than fast enough (I want to hash files, and ~12-15 GB/s is already comfortably more than disk IO speeds, if I can get SIMD working this will be higher too). So, I figure might as well go for something cryptographic.

I did start work on an XXHash3 attempt, but it wasn’t going so well, and once I discovered how simple KangarooTwelve was I thought I’d try that.

1 Like

The Markdown in your packages, including in the Readme of this package, is broken. It displays fine on Github, but not on JuliaHub. I think maybe you’re using some Github-specific slang?

Apart from that, IMO registering this wasn’t yet entirely appropriate considering you knew your implementation was still flawed, regarding the reinterpret usage. But whatever, it’s done now.

That would be because it’s not Markdown :upside_down_face:

The implementation is not “flawed”, it’s been rigorously checked for correctness. Deficiencies in the implementation of reinterpret simply limit the SIMD performance, but there’s not much I can do about that.

1 Like

Oh, OK, sorry then.

Currently using reinterpret like you do is simply something you shouldn’t do if you care for performance, because the Julia implementation doesn’t know how to handle it well. Considering that the package was motivated by performance, registering definitely seems premature.

Just do a few shifts and masks instead of reinterpreting.

1 Like

no worries, I think GitHub - hros/XXhash.jl: Julia wrapper for xxHash C library should be okay

I’ve found that to have equivalent performance with tuples, and I can’t see a nice way to do this with arrays. If you can, I’d love a PR :slightly_smiling_face:.

4 Likes