Hi I am trying to make the fastest code to UN-interleaving every 4 bits of a UInt64
.
So if I have the binary representation of my Int
as :
ABCD/EFGH/IJKL/MNOP/QRST/UVWX
(letters are 0 and 1)
I am looking for something like this AEIMQU/BFJNRV/CGKOSW/DHLPTX
It is a sort of binary matrix (4xN) transpose. Is there some build in function for this?
Looking around I made this function for UInt32
:
function packeveryfour(x::UInt32)#each step swaps the second and third quartiles of successively bigger pieces, result is odd bits go right and even bits go left
t = (x ⊻ (x >> 1)) & 0x22222222#mask 00100010001000100010001000100010
x = x ⊻ t ⊻ (t<<1)
t = (x ⊻ (x >> 2)) & 0x0C0C0C0C#mask 00001100000011000000110000001100
x = x ⊻ t ⊻ (t<<2)
t = (x ⊻ (x >> 4)) & 0x00F000F0#mask 00000000111100000000000011110000
x = x ⊻ t ⊻ (t<<4)
t = (x ⊻ (x >> 8)) & 0x0000FF00#mask 00000000000000001111111100000000
x = x ⊻ t ⊻ (t<<8)
t = (x ⊻ (x >> 1)) & 0x22222222#repeat the process
x = x ⊻ t ⊻ (t<<1)
t = (x ⊻ (x >> 2)) & 0x0C0C0C0C
x = x ⊻ t ⊻ (t<<2)
t = (x ⊻ (x >> 4)) & 0x00F000F0
x = x ⊻ t ⊻ (t<<4)
t = (x ⊻ (x >> 8)) & 0x0000FF00
x = x ⊻ t ⊻ (t<<8)
return x
end
This is quite fast,
But there is some problems, first I repeat two time the same thing and second there are some zeros in the middle of my answer when the input is smaller than a full UInt32
since this function is gonna use the empty bits left part of the UInt32
.
In practice I will work on the 52 first bits of a UInt64
so I will need to transpose a (13x4) Matrix.
To summarize:
- Is there a build in (or package) function to transpose binary matrix (store in a
Int
type)? - Is there a build in function to interleave bits (even a llvm/processor instruction)?
- Have you any others ideas how to make this very efficiently?
Thanks