Improving the fastest pure-Julia cryptographic hash!

Hi All!

For another project of mine, I wanted a good fast hash that wasn’t trivial to mock. After an excessive amount of investigation, I’ve settled on KangarooTwelve. To quote the project readme, I think it strikes a really nice balance between:

  • Simplicity (Keccak + sponge + hopping)
  • Security (128-bit, sharing the same cryptographic base as SHA3)
  • Speed (up to ~2bytes/cycle, with AVX512)

With a little help from @Lilith, @Sukera, and @Oscar_Smith (big thanks for the help so far :star_struck:) in #performance-helpdesk we’ve currently got what seems to be the fastest pure-julia cryptographic hash (by some margin).

julia> KangarooTwelve.throughput(:crc32c)
 crc32c throughput: ~30546 MiB/s

julia> KangarooTwelve.throughput(:md5)
 md5 throughput: ~1037 MiB/s

julia> KangarooTwelve.throughput(:sha1)
 sha1 throughput: ~653 MiB/s

julia> KangarooTwelve.throughput(:sha256)
 sha256 throughput: ~431 MiB/s

julia> KangarooTwelve.throughput(:sha3_256)
 sha3_256 throughput: ~210 MiB/s

julia> KangarooTwelve.throughput(:k12)
 KangarooTwelve (singlethreaded) throughput: ~1630 MiB/s

That said, it seems that we’re quite some way from where we could be. By my estimation, this is currently around ~4 cycles per byte, and according to the keccak team 1.4 cycles/byte is achievable with AVX2 and 0.5 cycles/byte with AVX512.

So, that’s a ~3-8x performance improvement to be had, and that’s not even considering threading (yes, this can be threaded).

I’m pretty sure I’m nearing the limit of my high-performance/micro-optimisation abilities though, and I’d love it if I might be able to interest some of the more performance/optimisation-minded folks around here to take a look and see how far we can push this!

1 Like

This is :fire:, but FTR AFAIK (not an expert) K12 may be kind of obsoleted (from performance and flexibility perspectives) by newer designs. One of the most awesome things about sponge-based/permutation-based crypto is how it enables implementing the entirety of symmetric crypto with most of the code shared between the “primitives”, something that K12 is not designed for as far as I remember (it’s just a hash I think, at least “officially”).

For example, Ascon is the winner of the Lightweight Cryptography challenge (by the US NIST):

Other finalists:

Furthermore, the Keccak Team themselves (after some iteration, including third-party work) introduced several new concepts that make sponges obsolete AFAIK, such as deck functions and Farfalle. These are not just new algorithms, but seem like they will change the way symmetric crypto is both implemented and used (new modes) for the better: Keccak Team

One interesting thing about implementing crypto primitives is that, AFAIK, safety from some side-channel attacks requires using assembly (or llvmcall, in the case of Julia, I guess). For example, ensuring that some operations are constant-time (to protect against timing attacks) as necessary sometimes requires protection from an overzealous compiler optimizer. I wonder whether there are any existing Julia projects that tackle such issues?

AFAIK the current “fastest” established general-purpose cryptographic hash is Blake3. However, at longer input lengths (what I care abount more), K12 is actually extremely competitive. To take an excerpt from the Blake3 paper:

K12 is a sponge + permutation :upside_down_face:.

I have seen :slight_smile: however, those are all authenticated encryption schemes, and AFAIK no established desk/farfalle based hash function currently exists.

My understanding is that this is much more relevant when looking at crypto that involves secrets (e.g. encryption). When hashing, an attacker knows everything to start with … which is a large part of why translating a cipher to a hash is non-trivial (the secret is no longer secret).

1 Like