[ANN] BitPermutations.jl: efficient routines for repeated bit permutations

I am happy to announce BitPermutations.jl, a package to efficiently permute the bits in some bitstring, given an arbitrary permutation.

Given a permutation over n bits, we can actually reshuffle bitstring in O(log n) time, by judiciously choosing operations which reshuffle the bits in parallel.
This is described more in detail here as well as in the package README.

In practice, this gives a speedup if one has to apply the same permutation to many different bitstrings.
In typical situations, you can expect a speedup of around 10x to 50x, compared to applying permute! on a BitVector.

The largest speedups are achieved by using intrinsics available on modern x86 CPUs.
If such operations are available, the permutation will be performed not with a traditional Beneš network, but by using reshuffling stages known as GRP, described originally in Donald Knuth’s The Art of Computer Programming.

Here is a usage example taken from the README.
We first create a given permutation p.

using BitPermutations

v = [2,6,5,8,4,7,1,3]

p = BitPermutation{UInt8}(v)

we can now apply it to a UInt8 bitstring

x = 0x08          # 0 0 0 1 0 0 0 0 (LSB -> MSB)

bitpermute(x, p)  # 0 0 0 0 1 0 0 0, or 0x10

p(x)              # idem

Contributions, feature request and general suggestions are very welcome.