Thanks for the shoutout for UInt12Arrays.
My initial suggestion is just to use the UInt8 array. Computers like working with bytes and are optimized for that scenario.
If you really want to optimize memory while sacrificing speed, you can try to build on top of BitArray. Arrays · The Julia Language
You might also want to take a look at how BioSequences is implemented.
https://github.com/BioJulia/BioSequences.jl
That said you can take advantage of the idea that you can pack two UInt4s into a single UInt8:
julia> struct UInt4Pair
data::UInt8
end
UInt4Pair(first, last) = UInt4Pair(first | (last << 4))
UInt4Pair
julia> first(p::UInt4Pair) = p.data & 0x0f
first (generic function with 1 method)
julia> last(p::UInt4Pair) = (p.data & 0xf0) >> 4
last (generic function with 1 method)
julia> function Base.show(io::IO, m::MIME{Symbol("text/plain")}, p::UInt4Pair)
show(io, m, first(p))
println(io)
show(io, m, last(p)))
julia> p = UInt4Pair(0,9)
0x00
0x09
julia> UInt4Pair(0x38)
0x08
0x03
julia> data = rand(UInt8, 3)
3-element Vector{UInt8}:
0xbc
0x72
0xef
julia> reinterpret(UInt4Pair, data)
3-element reinterpret(UInt4Pair, ::Vector{UInt8}):
UInt4Pair(0xbc)
UInt4Pair(0x72)
UInt4Pair(0xef)
julia> first.(reinterpret(UInt4Pair, data))
3-element Vector{UInt8}:
0x0c
0x02
0x0f
julia> last.(reinterpret(UInt4Pair, data))
3-element Vector{UInt8}:
0x0b
0x07
0x0e
julia> pack(a::Vector) = UInt4Pair.(a[1:2:end], a[2:2:end])
pack (generic function with 2 methods)
julia> function unpack(pairs::Vector{UInt4Pair})
out = Vector{UInt8}(undef, length(pairs) * 2)
out[1:2:end] = first.(pairs)
out[2:2:end] = last.(pairs)
return out
end
unpack (generic function with 1 method)
julia> base10_data = rand(0x0:0x9, 32)
32-element Vector{UInt8}:
0x05
0x03
0x02
0x07
0x08
0x01
0x04
0x08
0x06
0x00
0x05
0x09
0x01
⋮
0x01
0x01
0x02
0x04
0x08
0x00
0x01
0x02
0x09
0x08
0x07
0x01
julia> pack(base10_data)
16-element Vector{UInt4Pair}:
UInt4Pair(0x35)
UInt4Pair(0x72)
UInt4Pair(0x18)
UInt4Pair(0x84)
UInt4Pair(0x06)
UInt4Pair(0x95)
UInt4Pair(0x91)
UInt4Pair(0x00)
UInt4Pair(0x57)
UInt4Pair(0x65)
UInt4Pair(0x11)
UInt4Pair(0x42)
UInt4Pair(0x08)
UInt4Pair(0x21)
UInt4Pair(0x89)
UInt4Pair(0x17)
julia> unpack(pack(base10_data))
32-element Vector{UInt8}:
0x05
0x03
0x02
0x07
0x08
0x01
0x04
0x08
0x06
0x00
0x05
0x09
0x01
0x09
0x00
0x00
0x07
0x05
0x05
0x06
0x01
0x01
0x02
0x04
0x08
0x00
0x01
0x02
0x09
0x08
0x07
0x01
julia> unpack(pack(base10_data)) == base10_data
true
If you want to accelerate this further, you can use SIMD.jl