Compiling to branch table

You seem to be using the 64 bit packed arrays in that last benchmark, which are 8 times smaller than the original arrays, hence your code is ~8 times faster.

By the way, a lot of interesting discussion spawned from this question, including this topic which suggests that we should have been using larger vectors in these benchmarks. For 100k vectors, I get these timings:

julia> src = rand(1:16, 100_000);

julia> dst = similar(src);

julia> broadcast!($findcode, $dst, $src);
  510.262 μs (0 allocations: 0 bytes)

julia> broadcast!($findcode3, $dst, $src);
  637.381 μs (0 allocations: 0 bytes)

julia> broadcast!($findcode_lookuptable, $dst, $src);
  398.854 μs (0 allocations: 0 bytes)

So indeed the original solution seems to be faster than the accepted solution (but not as fast as a look-up-table), and much more readable IMO.

1 Like