Can I have a UInt12?

Here’s the real solution, an actual package:
https://github.com/JaneliaSciComp/UInt12Arrays.jl

Let’s talk about the underlying data first and the packing. I’ve seen many ways to pack 12-bit integers together, but the most natural and common way is to pack them consecutively. Let’s say we have six bytes of data or 48 bits. This represents four 12-bit integers. In the six bytes below, I’ve numbered the the first six nibbles (4-bits each) using 1 through 6 and the last six nibbles using a through f. If we reinterpret them as 24-bit integers via BitIntegers.jl we can see the order holds.

julia> data = UInt8[0x21, 0x43, 0x65, 0xba, 0xdc, 0xfe]
6-element Array{UInt8,1}:
 0x21
 0x43
 0x65
 0xba
 0xdc
 0xfe
julia> using UInt12Arrays

julia> A24 = reinterpret(UInt24,data)
2-element reinterpret(UInt24, ::Array{UInt8,1}):
 0x654321
 0xfedcba

We want 12-bit integers though and not 24-bit integers. It is currently not possible to make non-byte sized bitstypes in Julia, so we cannot just use reinterpret as we did for 24-bit integers.

julia> using BitIntegers

julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
 [1] top-level scope
   @ ~\.julia\packages\BitIntegers\fcpdN\src\BitIntegers.jl:60

julia> reinterpret(UInt12,data)
ERROR: ArgumentError: cannot reinterpret `UInt8` `UInt12`, type `UInt12` is not a bits type
Stacktrace:
 [1] (::Base.var"#throwbits#220")(::Type{UInt8}, ::Type{UInt12}, ::Type{UInt12}) at .\reinterpretarray.jl:16
 [2] reinterpret(::Type{UInt12}, ::Array{UInt8,1}) at .\reinterpretarray.jl:33
 [3] top-level scope at REPL[155]:1

What we essentially want to do is just break the 24-bit integer into 12-bit integer halfs:

julia> A24[1]
0x654321

julia> first(A24[1])
0x0321

julia> last(A24[1])
0x0654

julia> first(A24[2])
0x0cba

julia> last(A24[2])
0x0fed

This is what UInt12Array and UInt12Vector do. The default form uses UInt16 as an element type. However, you can also UInt12 as an element type.

julia> A16 = UInt12Vector(data)
4-element UInt12Array{UInt16,Array{UInt8,1},1}:
 0x0321
 0x0654
 0x0cba
 0x0fed

julia> A12 = UInt12Vector{UInt12}(data)
4-element UInt12Array{UInt12,Array{UInt8,1},1}:
 0x321
 0x654
 0xcba
 0xfed

UInt12 is actually just a boxed UInt16 in the current implementation, so you might as well just use UInt16 directly in most cases. The main advantage of using UInt12 is handling overflow correctly as well as display:

julia> A16[1] + 0xd00
0x1021

julia> A12[1] + 0xd00
0x021

However, even with 16-bit element types, assignment into the UInt12Array will properly discard the highest nibble:

julia> A16[1]
0x0321

julia> A16[1] += 0xd00; A16[1]
0x0021

Note that UInt12Array is basically just a 12-bit unsigned integer view of the original bytes. By changing the first 12-bit integer, we removed the 3 nibble from all the other views of the same underlying data:

julia> A24
2-element reinterpret(UInt24, ::Array{UInt8,1}):
 0x654021
 0xfedcba

julia> A12
4-element UInt12Array{UInt12,Array{UInt8,1},1}:
 0x021
 0x654
 0xcba
 0xfed

julia> A16
4-element UInt12Array{UInt16,Array{UInt8,1},1}:
 0x0021
 0x0654
 0x0cba
 0x0fed

julia> A16[1]
0x0021

julia> data[2] = 0x43
0x43

julia> A12
4-element UInt12Array{UInt12,Array{UInt8,1},1}:
 0x321
 0x654
 0xcba
 0xfed

Because UInt12Array is a view on the underlying data it is pretty inexpensive to create, but it may take a while to convert the entire array to a true UInt16 array that may be faster to work with for other applications:

julia> data = rand(UInt8, 1024*1024*1024*2+1)
2147483649-element Array{UInt8,1}:
 0x84
 0x7d
 0xde
 0x63
 0x99
 0x39
 0xcb
    ⋮
 0x8a
 0x42
 0x88
 0x30
 0x88
 0x57

julia> @time A16 = UInt12Vector(data)
  0.000013 seconds (6 allocations: 336 bytes)
1431655766-element UInt12Array{UInt16,Array{UInt8,1},1}:
 0x0d84
 0x0de7
 0x0963
 0x0399
 0x01cb
 0x0c2d
 0x0795
      ⋮
 0x0d52
 0x0605
 0x028a
 0x0884
 0x0830
 0x0578

julia> @time copy(A16)
  5.329485 seconds (2 allocations: 2.667 GiB, 3.17% gc time)
1431655766-element Array{UInt16,1}:
 0x0d84
 0x0de7
 0x0963
 0x0399
 0x01cb
 0x0c2d
 0x0795
 0x0e67
 0x0dd7
 0x05f4
 0x0f4b
 0x006b
 0x0386
 0x09f1
 0x07bc
 0x0b50
      ⋮
 0x0538
 0x02a2
 0x0d40
 0x092e
 0x00ad
 0x0412
 0x0dbf
 0x0496
 0x079b
 0x0d52
 0x0605
 0x028a
 0x0884
 0x0830
 0x0578

For converting the entire array to a native UInt16 array, I overrode Base.convert and used SIMD.jl to accelerate conversion:

julia> @time convert(Array{UInt16}, A16)
  1.029260 seconds (6 allocations: 2.667 GiB, 8.86% gc time)
1431655766-element Array{UInt16,1}:
 0x0d84
 0x0de7
 0x0963
 0x0399

Let me know if this works for you. One possible complication that I can see is if your 12-bit integers are packed differently.

6 Likes