Here’s the real solution, an actual package:
https://github.com/JaneliaSciComp/UInt12Arrays.jl
Let’s talk about the underlying data first and the packing. I’ve seen many ways to pack 12-bit integers together, but the most natural and common way is to pack them consecutively. Let’s say we have six bytes of data or 48 bits. This represents four 12-bit integers. In the six bytes below, I’ve numbered the the first six nibbles (4-bits each) using 1
through 6
and the last six nibbles using a
through f
. If we reinterpret them as 24-bit integers via BitIntegers.jl we can see the order holds.
julia> data = UInt8[0x21, 0x43, 0x65, 0xba, 0xdc, 0xfe]
6-element Array{UInt8,1}:
0x21
0x43
0x65
0xba
0xdc
0xfe
julia> using UInt12Arrays
julia> A24 = reinterpret(UInt24,data)
2-element reinterpret(UInt24, ::Array{UInt8,1}):
0x654321
0xfedcba
We want 12-bit integers though and not 24-bit integers. It is currently not possible to make non-byte sized bitstypes in Julia, so we cannot just use reinterpret
as we did for 24-bit integers.
julia> using BitIntegers
julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
[1] top-level scope
@ ~\.julia\packages\BitIntegers\fcpdN\src\BitIntegers.jl:60
julia> reinterpret(UInt12,data)
ERROR: ArgumentError: cannot reinterpret `UInt8` `UInt12`, type `UInt12` is not a bits type
Stacktrace:
[1] (::Base.var"#throwbits#220")(::Type{UInt8}, ::Type{UInt12}, ::Type{UInt12}) at .\reinterpretarray.jl:16
[2] reinterpret(::Type{UInt12}, ::Array{UInt8,1}) at .\reinterpretarray.jl:33
[3] top-level scope at REPL[155]:1
What we essentially want to do is just break the 24-bit integer into 12-bit integer halfs:
julia> A24[1]
0x654321
julia> first(A24[1])
0x0321
julia> last(A24[1])
0x0654
julia> first(A24[2])
0x0cba
julia> last(A24[2])
0x0fed
This is what UInt12Array
and UInt12Vector
do. The default form uses UInt16
as an element type. However, you can also UInt12
as an element type.
julia> A16 = UInt12Vector(data)
4-element UInt12Array{UInt16,Array{UInt8,1},1}:
0x0321
0x0654
0x0cba
0x0fed
julia> A12 = UInt12Vector{UInt12}(data)
4-element UInt12Array{UInt12,Array{UInt8,1},1}:
0x321
0x654
0xcba
0xfed
UInt12
is actually just a boxed UInt16
in the current implementation, so you might as well just use UInt16
directly in most cases. The main advantage of using UInt12
is handling overflow correctly as well as display:
julia> A16[1] + 0xd00
0x1021
julia> A12[1] + 0xd00
0x021
However, even with 16-bit element types, assignment into the UInt12Array
will properly discard the highest nibble:
julia> A16[1]
0x0321
julia> A16[1] += 0xd00; A16[1]
0x0021
Note that UInt12Array
is basically just a 12-bit unsigned integer view of the original bytes. By changing the first 12-bit integer, we removed the 3
nibble from all the other views of the same underlying data:
julia> A24
2-element reinterpret(UInt24, ::Array{UInt8,1}):
0x654021
0xfedcba
julia> A12
4-element UInt12Array{UInt12,Array{UInt8,1},1}:
0x021
0x654
0xcba
0xfed
julia> A16
4-element UInt12Array{UInt16,Array{UInt8,1},1}:
0x0021
0x0654
0x0cba
0x0fed
julia> A16[1]
0x0021
julia> data[2] = 0x43
0x43
julia> A12
4-element UInt12Array{UInt12,Array{UInt8,1},1}:
0x321
0x654
0xcba
0xfed
Because UInt12Array
is a view on the underlying data it is pretty inexpensive to create, but it may take a while to convert the entire array to a true UInt16
array that may be faster to work with for other applications:
julia> data = rand(UInt8, 1024*1024*1024*2+1)
2147483649-element Array{UInt8,1}:
0x84
0x7d
0xde
0x63
0x99
0x39
0xcb
⋮
0x8a
0x42
0x88
0x30
0x88
0x57
julia> @time A16 = UInt12Vector(data)
0.000013 seconds (6 allocations: 336 bytes)
1431655766-element UInt12Array{UInt16,Array{UInt8,1},1}:
0x0d84
0x0de7
0x0963
0x0399
0x01cb
0x0c2d
0x0795
⋮
0x0d52
0x0605
0x028a
0x0884
0x0830
0x0578
julia> @time copy(A16)
5.329485 seconds (2 allocations: 2.667 GiB, 3.17% gc time)
1431655766-element Array{UInt16,1}:
0x0d84
0x0de7
0x0963
0x0399
0x01cb
0x0c2d
0x0795
0x0e67
0x0dd7
0x05f4
0x0f4b
0x006b
0x0386
0x09f1
0x07bc
0x0b50
⋮
0x0538
0x02a2
0x0d40
0x092e
0x00ad
0x0412
0x0dbf
0x0496
0x079b
0x0d52
0x0605
0x028a
0x0884
0x0830
0x0578
For converting the entire array to a native UInt16
array, I overrode Base.convert
and used SIMD.jl to accelerate conversion:
julia> @time convert(Array{UInt16}, A16)
1.029260 seconds (6 allocations: 2.667 GiB, 8.86% gc time)
1431655766-element Array{UInt16,1}:
0x0d84
0x0de7
0x0963
0x0399
Let me know if this works for you. One possible complication that I can see is if your 12-bit integers are packed differently.