[ANN] KangarooTwelve.jl — The fastest cryptographic hash in Julia

tecosaur · November 11, 2023, 5:37am

For a while now I’ve been dissatisfied with the hashing situation in Julia, with the need to compromise between:

Fast and laughably non-cryptographic (CRC32c)
Cryptographic, but slow (SHA2/SHA3)

This has led me to go “screw it, I’ll implement a fast cryptographic hash myself”. It’s been a bit of an odyssey tracking down some of the trickier performance pitfalls currently in Julia (e.g. the big effect function inlining can have on heap allocations), and we’re still lagging C/Rust performance by a factor of 2-3x, however, I’ve finally reached the point that I’m happy enough with the implementation to start using it!

This beats out everything but CRC32c by some margin, across all input sizes.

The single-threaded version is entirely stack-allocated, and the multithreaded currently performs a fixed number of heap allocations (around 90).

The code base is also relatively small (~600 lines excluding docstrings), and I like to think fairly readable. The only dependencies are Mmap (stdlib), SIMD, and PrecompileTools.

KangarooTwelve also has some neat design aspects, such as support for domain separation via customisation strings, also being an XOF (eXtendible output function) via the use of a cryptographic sponge (I’m thinking it might be fun to make it subtype IO), and a binary-tree “vine” format to allow for SIMD and threading parallelisation (though I haven’t been able to get SIMD working well yet).

As of yesterday, KangarooTwelve is registered in General, and using it is as simple as:

julia> using KangarooTwelve

julia> k12("keccak") # or a Vector{<:Unsigned}, or IO
0x712cb0b00c8d635d8e4750b9678c8f31

Enjoy!

jar1 · November 11, 2023, 6:19am

Hash benchmarks site is missing k12 but otherwise informative

https://rurban.github.io/smhasher/doc/table.html

StevenSiew · November 11, 2023, 9:35am

Er? Why is it called KangarooTwelve?
That is a strange name for a package.

tecosaur · November 11, 2023, 9:36am

Because that’s the hashing algorithm: Keccak Team - KangarooTwelve

It’s twelve rounds of Keccak hopping along the data.

cormullion · November 11, 2023, 9:52am

It’s a cool name. An icon almost draws itself:

stevengj · November 11, 2023, 3:19pm

Why not just call an existing Keccak library directly? For hashing, where you are operating on byte arrays, what is the advantage of re-implementing in Julia, given that it seems that dozens of implementations already exist, and it sounds like some of the C libraries are quite optimized?

OpenSSL and libgcrypt both have Keccak implementations, and both already have JLL binary packages. (Unfortunately, Keccak will only be released in OpenSSL 3.2, which is supposed to come out “soon”; our JLL currently packages OpenSSL 3.0.12. Libgcrypt has had Keccak since 2015, so I assume it is included in our JLL.) Or you could wrap another optimized C library.

tecosaur · November 11, 2023, 3:39pm

One of the nice things about Keccak is that it’s rather simple to implement: indeed other than defining some constants it’s only ~11 lines of Julia, and I have been lead to the impression that when it comes to basic numerical-type operations Julia performs around C/Rust levels anyway (I am now much more aware of the difficulties in tracking down allocations and reinterpret overhead).

So, I thought given the simplicity of the scheme it might be nice to have a pure-Julia implementation. There are also a few other reasons, for example the “cryptographic sponge” concept is actually quite versatile (it can be used as a deterministic RNG for instance), and so I think it’s nice to have more than just a single C “hash this” entrypoint.

As an aside, the OpenSSL function you link doesn’t seem to be KangarooTwelve or the Keccak permutation/SHAKE, just SHA3-256 with a different delimiter byte.

stevengj · November 11, 2023, 4:00pm

Yes, if you compare it to a 15-line C implementation, a 15-line Julia implementation should be generally be comparable in performance (modulo the usual Julia performance tips about type stability, avoiding allocations, etcetera). But if you compare to a highly optimized C implementation that uses SIMD etcetera, then you will need to do similar work on the Julia side. e.g. libgcrypt’s keccak(?) implementation is well over a 1000 lines of code; you aren’t likely to rival something like that with a 15-line Julia implementation.

On the other hand, if you have a simple, compact Julia implementation that has reasonable performance, I agree that this is valuable to have for its own sake.

tecosaur · November 11, 2023, 4:06pm

That ~11 line Julia implementation (I feel like I can fairly not count the comments), also supports SIMD (and does x4 pretty well on AVX512 machines) . Someone with AVX512 (I only have SSE3) was able to see TurboSHAKE128 go from ~1.5 GiB/s to ~6 GiB/s with SIMDx4. My SIMD headaches mainly lie in other parts of the code and how SIMD works with reinterpreted arrays (badly).

jling · November 11, 2023, 4:27pm

how does XxHash-3 compete with what you’ve shown here? Recently a library / fileformat opitmized for speed has switched from CRC32c to XxHash-3 and I wonder if this is why

tecosaur · November 11, 2023, 4:46pm

XXHash should be faster, but for my usage KangarooTwelve should be more than fast enough (I want to hash files, and ~12-15 GB/s is already comfortably more than disk IO speeds, if I can get SIMD working this will be higher too). So, I figure might as well go for something cryptographic.

I did start work on an XXHash3 attempt, but it wasn’t going so well, and once I discovered how simple KangarooTwelve was I thought I’d try that.

nsajko · November 11, 2023, 5:03pm

The Markdown in your packages, including in the Readme of this package, is broken. It displays fine on Github, but not on JuliaHub. I think maybe you’re using some Github-specific slang?

Apart from that, IMO registering this wasn’t yet entirely appropriate considering you knew your implementation was still flawed, regarding the reinterpret usage. But whatever, it’s done now.

tecosaur · November 11, 2023, 5:05pm

That would be because it’s not Markdown

The implementation is not “flawed”, it’s been rigorously checked for correctness. Deficiencies in the implementation of reinterpret simply limit the SIMD performance, but there’s not much I can do about that.

nsajko · November 11, 2023, 5:11pm

Oh, OK, sorry then.

Currently using reinterpret like you do is simply something you shouldn’t do if you care for performance, because the Julia implementation doesn’t know how to handle it well. Considering that the package was motivated by performance, registering definitely seems premature.

Just do a few shifts and masks instead of reinterpreting.

jling · November 11, 2023, 5:12pm

no worries, I think GitHub - hros/XXhash.jl: Julia wrapper for xxHash C library should be okay

tecosaur · November 11, 2023, 5:18pm

I’ve found that to have equivalent performance with tuples, and I can’t see a nice way to do this with arrays. If you can, I’d love a PR .

Topic		Replies	Views
Improving the fastest pure-Julia cryptographic hash! Performance	3	656	September 29, 2023
SIMD struggles, seeking solutions (with KangarooTwelve.jl) Performance	23	882	November 7, 2023
Use of MurmurHash3 for hashing strings Internals & Design	29	8613	March 30, 2018
Secure hashing algorithm in Julia for passwords Web Stack	8	2305	January 23, 2019
[ANN] ChaChaCiphers.jl: fast cryptographic RNG and stream ciphers Package Announcements announcement , rng , cryptography	3	696	May 17, 2022

[ANN] KangarooTwelve.jl — The fastest cryptographic hash in Julia

Related topics