N-bit Integers in Julia

After getting some updated pointers from Rust dev Jubilee Young and encouragement from C standard project editor JeanHeyd Meneide on the arbitrary bitwidth integers (also known n-bit integers from the C23 draft standard), I started looking into what may be blocking this within Julia.

My current conclusion is that we need to find three bits from somewhere to represent the number of unused bits (0 - 7). My guess is that we could take those from the padding field of jl_datatype_layout_t.

2 Likes

I am confused, what is the advantage of implementing it in Julia instead of BitIntegers.jl? Introducing arbitrary primitive type to the language is certainly misleading if they are not fully-supported.

The only advantage seems to be saving a few llvmcalls, which does not seem to be a large amount of work comparing to shaking the type system of Julia.

1 Like

BitIntegers.jl and julia are currently bound to defining primitive types that are a multiple of a byte. BitIntegers.jl just creates a new primitive. Trying to create a primitive that is not a mulitple of a byte results in an error.

julia> primitive type UInt4 <: Unsigned 4 end
ERROR: invalid number of bits in primitive type UInt4
Stacktrace:
 [1] top-level scope
   @ REPL[129]:1

julia> primitive type UInt12 <: Unsigned 12 end
ERROR: invalid number of bits in primitive type UInt12
Stacktrace:
 [1] top-level scope
   @ REPL[130]:1

julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
 [1] top-level scope
   @ C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:60

julia> @macroexpand BitIntegers.@define_integers 12
quote
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:60 =#
    primitive type Int12 <: BitIntegers.AbstractBitSigned 12 end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:61 =#
    primitive type UInt12 <: BitIntegers.AbstractBitUnsigned 12 end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:63 =#
    (BitIntegers.Base).Signed(var"#208#x"::UInt12) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:63 =#
            Int12(var"#208#x")
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:64 =#
    (BitIntegers.Base).Unsigned(var"#209#x"::Int12) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:64 =#
            UInt12(var"#209#x")
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:65 =#
    (BitIntegers.Base).uinttype(::BitIntegers.Type{Int12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:65 =#
            UInt12
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:66 =#
    (BitIntegers.Base).uinttype(::BitIntegers.Type{UInt12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:66 =#
            UInt12
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:68 =#
    (BitIntegers.Base).widen(::BitIntegers.Type{Int12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:68 =#
            Int24
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:69 =#
    (BitIntegers.Base).widen(::BitIntegers.Type{UInt12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:69 =#
            UInt24
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:71 =#
    macro int12_str(var"#214#s")
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:71 =#
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:72 =#
        return BitIntegers.parse(Int12, var"#214#s")
    end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:74 =#
    macro uint12_str(var"#215#s")
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:74 =#
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:75 =#
        return BitIntegers.parse(UInt12, var"#215#s")
    end
end

LLVM supports arbitrary N-bit signed and unsigned integers. The question then is why doesnā€™t Julia support these types?

2 Likes

LLVM supports arbitrary N-bit signed and unsigned integers. The question then is why doesnā€™t Julia support these types?

Only things that have to be in the language are added to the language. The easiness of that is not a reason to add it, for example supporting a piping syntax to Julia is easy enough, but it becomes very, very contentious when one wants it in Julia itself.

From #35526, primitive type is considered leaky implementation detail, thus discouraged to be used by language users. Its existence is only justified by the limitation of current Julia implementation. Arbitrary integer types, on the other hand, does not have any problem residing in a package. Supporting it in Julia will incur a lot of discussions: what will be its representation, what about its alignment, will it save space when stored in an array, how do we infer arbitrary but limited precision integers?

1 Like

How would one create a 12-bit integer in a package? As I demonstrated above, BitIntegers.jl does not allow one to define a 12-bit integer because Julia does not allow one to define a 12-bit primitive. I made an atttempt, but defining UInt12 was difficult enough that I just defaulted the element type to be UInt16.

Are u1 and i1 really still buggy in LLVM? How about u12 and i12?

My guess is that the situation has improved as N-bit integers are now part of the draft C23 standard. I think we can reference that to answer some of your questions.

Hereā€™s a readable overview:

How would one create a 12-bit integer in a package? As I demonstrated above, BitIntegers.jl does not allow one to define a 12-bit integer because Julia does not allow one to define a 12-bit primitive. I made an atttempt, but defining UInt12 was difficult enough that I just defaulted the element type to be UInt16.

I still canā€™t see the difference between the 12-bit integer type you mean and UInt16 with a wrapper that truncates or extends the number. In N2763, they define arbitrary precision integers be represented by the smallest possible power-of-2 digit integer, so UInt16 is the right choice.

This is trivial to debunk.

1 Like

You can pack two 12-bit integers into 3 bytes (since 3*8 = 24). But it takes 4 bytes if you represent them as UInt16s with 4 unused bits each.

I donā€™t want to speak for @mkitti because Iā€™m not sure of the intended application, but I wouldnā€™t be surprised if this involves data from scientific cameras or data acquisition cards. Itā€™s a nontrivial issue because modern instrumentation can produce data at rates exceeding 1GB/s (ā‰ˆ 4TB/hr), and there are pipelines that may really notice a ā€œuselessā€ 33% increase in data volume.

At the same time, you want this to work seamlessly enough to make it trivial to exploit such arrays without big costs elsewhere. Otherwise, you may be better off just accepting the 33% increase. Thatā€™s what Iā€™ve typically chosen to do, but I would be grateful to see a nice solution for this issue.

10 Likes

The issue with this is alignment, because how do you index into this array. Itā€™s never going to be efficient or normal hardware.

1 Like

Random reads and writes might be a problem. But most modern compression algorithms can process a compressed stream at nearly memcpy speed and they have to deal with variable arbitrary lengths.

There are several issues involved here.

The primary goal here is to describe bit-precise integers in a way that is not dependent on knowing the implementation details of the underlying processor architecture.

A secondary goal that has been mentioned is packing these bit-precise integers into arrays and perhaps unpacking them.

In UInt12Arrays.jl/UInt12s.jl at main Ā· JaneliaSciComp/UInt12Arrays.jl Ā· GitHub I created a UInt12 type that can use either a UInt16 or UInt24 backend. This ends up making a lot of assumptions about what the underlying memory representation looks like.

Meanwhile, LLVM knows perfetly well what a u12 is and can compile efficient code to use that type. If this is also being used by C23, then I suspect these compilation paths will be well tested in the future. Instead of fighting our compiler, letā€™s use it.

LLVM denominates things in terms of bits. However, we always provide with multiples of 8.

Iā€™m not sure exactly how this will work out, but I think we should consider doing the experiment in 2023. If we ask LLVM to work with u1, u4, or u12, what code would it generate?

5 Likes