Substitution for nonexistent 4bit-arrays

As I mentioned in my previous posts I am new to Julia and am planning to dive into it.
For the following question I have been looking into the Julia 1.7 Documentation, but have not found an answer or explanation - if I have overlooked something then please point me towards it.

I would like to store an array whose elements consist of numbers only in the range 0…9, that means a 4bit-array with elements of 4bit size filled with UInt4 numbers would be enough, but neither such arrays nor Uint4 do exist. Storing numbers in the range 0…9 in an UInt8 array would be certainly possible, but the latter seems to be a waste of RAM as the planned array is of huge size anyway. These numbers are only results, they would not be used in further calculations.

Now my question is:
To store such single numbers in the range 0…9, what is not only the EASIEST way but also the FASTEST way in Julia at program execution time to store them in an array?

Michael

Maybe you can take inspiration from https://github.com/JaneliaSciComp/UInt12Arrays.jl (see also the discussion here).

Thanks for the shoutout for UInt12Arrays.

My initial suggestion is just to use the UInt8 array. Computers like working with bytes and are optimized for that scenario.

If you really want to optimize memory while sacrificing speed, you can try to build on top of BitArray. Arrays · The Julia Language

You might also want to take a look at how BioSequences is implemented.
https://github.com/BioJulia/BioSequences.jl

That said you can take advantage of the idea that you can pack two UInt4s into a single UInt8:

julia> struct UInt4Pair
           data::UInt8
       end
       UInt4Pair(first, last) = UInt4Pair(first | (last << 4))
UInt4Pair

julia> first(p::UInt4Pair) = p.data & 0x0f
first (generic function with 1 method)

julia> last(p::UInt4Pair) = (p.data & 0xf0) >> 4
last (generic function with 1 method)

julia> function Base.show(io::IO, m::MIME{Symbol("text/plain")}, p::UInt4Pair)
           show(io, m, first(p))
           println(io)
           show(io, m, last(p)))

julia> p = UInt4Pair(0,9)
0x00
0x09

julia> UInt4Pair(0x38)
0x08
0x03

julia> data = rand(UInt8, 3)
3-element Vector{UInt8}:
 0xbc
 0x72
 0xef

julia> reinterpret(UInt4Pair, data)
3-element reinterpret(UInt4Pair, ::Vector{UInt8}):
 UInt4Pair(0xbc)
 UInt4Pair(0x72)
 UInt4Pair(0xef)

julia> first.(reinterpret(UInt4Pair, data))
3-element Vector{UInt8}:
 0x0c
 0x02
 0x0f

julia> last.(reinterpret(UInt4Pair, data))
3-element Vector{UInt8}:
 0x0b
 0x07
 0x0e

julia> pack(a::Vector) = UInt4Pair.(a[1:2:end], a[2:2:end])
pack (generic function with 2 methods)

julia> function unpack(pairs::Vector{UInt4Pair})
           out = Vector{UInt8}(undef, length(pairs) * 2)
           out[1:2:end] = first.(pairs)
           out[2:2:end] = last.(pairs)
           return out
       end
unpack (generic function with 1 method)

julia> base10_data = rand(0x0:0x9, 32)
32-element Vector{UInt8}:
 0x05
 0x03
 0x02
 0x07
 0x08
 0x01
 0x04
 0x08
 0x06
 0x00
 0x05
 0x09
 0x01
    ⋮
 0x01
 0x01
 0x02
 0x04
 0x08
 0x00
 0x01
 0x02
 0x09
 0x08
 0x07
 0x01

julia> pack(base10_data)
16-element Vector{UInt4Pair}:
 UInt4Pair(0x35)
 UInt4Pair(0x72)
 UInt4Pair(0x18)
 UInt4Pair(0x84)
 UInt4Pair(0x06)
 UInt4Pair(0x95)
 UInt4Pair(0x91)
 UInt4Pair(0x00)
 UInt4Pair(0x57)
 UInt4Pair(0x65)
 UInt4Pair(0x11)
 UInt4Pair(0x42)
 UInt4Pair(0x08)
 UInt4Pair(0x21)
 UInt4Pair(0x89)
 UInt4Pair(0x17)

julia> unpack(pack(base10_data))
32-element Vector{UInt8}:
 0x05
 0x03
 0x02
 0x07
 0x08
 0x01
 0x04
 0x08
 0x06
 0x00
 0x05
 0x09
 0x01
 0x09
 0x00
 0x00
 0x07
 0x05
 0x05
 0x06
 0x01
 0x01
 0x02
 0x04
 0x08
 0x00
 0x01
 0x02
 0x09
 0x08
 0x07
 0x01

julia> unpack(pack(base10_data)) == base10_data
true

If you want to accelerate this further, you can use SIMD.jl

7 Likes

Thanks for both hints.

@mkitti provided that Julia coding example for my topic, thank you very much for that idea :+1: !
Mark, you have written:

Keeping your suggestion in mind, would it be a very fast way? Or even perhaps the fastest way?
And can it be swapped out to a GPU?

Yes, we’re mostly just performing elementwise byte operations of bit shifting, >> or << and or &. This can be done in parallel. CPU vector instructions via SIMD are quite fast at this, but GPUs are also fast. The question is if the GPU is fast enough to overcome the time needed to transfer memory to the GPU and back, assuming that your other workflows use the CPU. You might be able to use GPUDirect technology to transfer data directly from GPU memory to and from disk.

1 Like

Mark, thanks for your patience, I appreciate it!