N-bit Integers in Julia

mkitti · February 19, 2023, 3:39am

After getting some updated pointers from Rust dev Jubilee Young and encouragement from C standard project editor JeanHeyd Meneide on the arbitrary bitwidth integers (also known n-bit integers from the C23 draft standard), I started looking into what may be blocking this within Julia.

github.com/JuliaLang/julia

Support arbitrary bitwidth integers

opened 09:30PM - 27 May 22 UTC

mkitti

LLVM currently has support for arbitrary bitwidth integers. https://llvm.org/do…cs/LangRef.html#integer-type https://reviews.llvm.org/rG5f0903e9bec97e67bf34d887bcbe9d05790de934 https://reviews.llvm.org/rG6c75ab5f66b4 https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329?u=programmerjake Rust: https://internals.rust-lang.org/t/pre-rfc-arbitrary-bit-width-integers/15603 https://github.com/rust-lang/rfcs/pull/2581 Zig: https://ziglang.org/documentation/master/#Primitive-Types C23: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf

My current conclusion is that we need to find three bits from somewhere to represent the number of unused bits (0 - 7). My guess is that we could take those from the padding field of jl_datatype_layout_t.

github.com

JuliaLang/julia/blob/f64463d5936e60d45498d8ad1dac8b2deb77e7bf/src/julia.h#L512-L531


      
          typedef struct {
              uint32_t size;
              uint32_t nfields;
              uint32_t npointers; // number of pointers embedded inside
              int32_t first_ptr; // index of the first pointer (or -1)
              uint16_t alignment; // strictest alignment over all fields
              uint16_t haspadding : 1; // has internal undefined bytes
              uint16_t fielddesc_type : 2; // 0 -> 8, 1 -> 16, 2 -> 32, 3 -> foreign type
              uint16_t padding : 13;
              // union {
              //     jl_fielddesc8_t field8[nfields];
              //     jl_fielddesc16_t field16[nfields];
              //     jl_fielddesc32_t field32[nfields];
              // };
              // union { // offsets relative to data start in words
              //     uint8_t ptr8[npointers];
              //     uint16_t ptr16[npointers];
              //     uint32_t ptr32[npointers];
              // };
          } jl_datatype_layout_t;

melonedo · February 19, 2023, 4:58am

I am confused, what is the advantage of implementing it in Julia instead of BitIntegers.jl? Introducing arbitrary primitive type to the language is certainly misleading if they are not fully-supported.

The only advantage seems to be saving a few llvmcalls, which does not seem to be a large amount of work comparing to shaking the type system of Julia.

mkitti · February 19, 2023, 5:05am

BitIntegers.jl and julia are currently bound to defining primitive types that are a multiple of a byte. BitIntegers.jl just creates a new primitive. Trying to create a primitive that is not a mulitple of a byte results in an error.

julia> primitive type UInt4 <: Unsigned 4 end
ERROR: invalid number of bits in primitive type UInt4
Stacktrace:
 [1] top-level scope
   @ REPL[129]:1

julia> primitive type UInt12 <: Unsigned 12 end
ERROR: invalid number of bits in primitive type UInt12
Stacktrace:
 [1] top-level scope
   @ REPL[130]:1

julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
 [1] top-level scope
   @ C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:60

julia> @macroexpand BitIntegers.@define_integers 12
quote
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:60 =#
    primitive type Int12 <: BitIntegers.AbstractBitSigned 12 end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:61 =#
    primitive type UInt12 <: BitIntegers.AbstractBitUnsigned 12 end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:63 =#
    (BitIntegers.Base).Signed(var"#208#x"::UInt12) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:63 =#
            Int12(var"#208#x")
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:64 =#
    (BitIntegers.Base).Unsigned(var"#209#x"::Int12) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:64 =#
            UInt12(var"#209#x")
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:65 =#
    (BitIntegers.Base).uinttype(::BitIntegers.Type{Int12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:65 =#
            UInt12
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:66 =#
    (BitIntegers.Base).uinttype(::BitIntegers.Type{UInt12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:66 =#
            UInt12
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:68 =#
    (BitIntegers.Base).widen(::BitIntegers.Type{Int12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:68 =#
            Int24
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:69 =#
    (BitIntegers.Base).widen(::BitIntegers.Type{UInt12}) = begin
            #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:69 =#
            UInt24
        end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:71 =#
    macro int12_str(var"#214#s")
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:71 =#
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:72 =#
        return BitIntegers.parse(Int12, var"#214#s")
    end
    #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:74 =#
    macro uint12_str(var"#215#s")
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:74 =#
        #= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:75 =#
        return BitIntegers.parse(UInt12, var"#215#s")
    end
end

LLVM supports arbitrary N-bit signed and unsigned integers. The question then is why doesn’t Julia support these types?

melonedo · February 19, 2023, 5:53am

LLVM supports arbitrary N-bit signed and unsigned integers. The question then is why doesn’t Julia support these types?

Only things that have to be in the language are added to the language. The easiness of that is not a reason to add it, for example supporting a piping syntax to Julia is easy enough, but it becomes very, very contentious when one wants it in Julia itself.

From #35526, primitive type is considered leaky implementation detail, thus discouraged to be used by language users. Its existence is only justified by the limitation of current Julia implementation. Arbitrary integer types, on the other hand, does not have any problem residing in a package. Supporting it in Julia will incur a lot of discussions: what will be its representation, what about its alignment, will it save space when stored in an array, how do we infer arbitrary but limited precision integers?

mkitti · February 19, 2023, 6:49am

How would one create a 12-bit integer in a package? As I demonstrated above, BitIntegers.jl does not allow one to define a 12-bit integer because Julia does not allow one to define a 12-bit primitive. I made an atttempt, but defining UInt12 was difficult enough that I just defaulted the element type to be UInt16.

Are u1 and i1 really still buggy in LLVM? How about u12 and i12?

My guess is that the situation has improved as N-bit integers are now part of the draft C23 standard. I think we can reference that to answer some of your questions.

Here’s a readable overview:

melonedo · February 19, 2023, 8:34am

How would one create a 12-bit integer in a package? As I demonstrated above, BitIntegers.jl does not allow one to define a 12-bit integer because Julia does not allow one to define a 12-bit primitive. I made an atttempt, but defining UInt12 was difficult enough that I just defaulted the element type to be UInt16.

I still can’t see the difference between the 12-bit integer type you mean and UInt16 with a wrapper that truncates or extends the number. In N2763, they define arbitrary precision integers be represented by the smallest possible power-of-2 digit integer, so UInt16 is the right choice.

uniment · February 19, 2023, 11:05am

This is trivial to debunk.

tim.holy · February 19, 2023, 12:59pm

You can pack two 12-bit integers into 3 bytes (since 3*8 = 24). But it takes 4 bytes if you represent them as UInt16s with 4 unused bits each.

I don’t want to speak for @mkitti because I’m not sure of the intended application, but I wouldn’t be surprised if this involves data from scientific cameras or data acquisition cards. It’s a nontrivial issue because modern instrumentation can produce data at rates exceeding 1GB/s (≈ 4TB/hr), and there are pipelines that may really notice a “useless” 33% increase in data volume.

At the same time, you want this to work seamlessly enough to make it trivial to exploit such arrays without big costs elsewhere. Otherwise, you may be better off just accepting the 33% increase. That’s what I’ve typically chosen to do, but I would be grateful to see a nice solution for this issue.

gbaraldi · February 19, 2023, 1:47pm

The issue with this is alignment, because how do you index into this array. It’s never going to be efficient or normal hardware.

ndinsmore · February 19, 2023, 2:11pm

Random reads and writes might be a problem. But most modern compression algorithms can process a compressed stream at nearly memcpy speed and they have to deal with variable arbitrary lengths.

mkitti · February 19, 2023, 6:03pm

There are several issues involved here.

The primary goal here is to describe bit-precise integers in a way that is not dependent on knowing the implementation details of the underlying processor architecture.

A secondary goal that has been mentioned is packing these bit-precise integers into arrays and perhaps unpacking them.

In UInt12Arrays.jl/UInt12s.jl at main · JaneliaSciComp/UInt12Arrays.jl · GitHub I created a UInt12 type that can use either a UInt16 or UInt24 backend. This ends up making a lot of assumptions about what the underlying memory representation looks like.

Meanwhile, LLVM knows perfetly well what a u12 is and can compile efficient code to use that type. If this is also being used by C23, then I suspect these compilation paths will be well tested in the future. Instead of fighting our compiler, let’s use it.

LLVM denominates things in terms of bits. However, we always provide with multiples of 8.

github.com

JuliaLang/julia/blob/c82aeb71f08703ec7c5929be5d94bec40c90f678/src/julia.h#L1107


      
              return jl_svecref(jl_get_fieldtypes(st), i);
          }
          STATIC_INLINE jl_value_t *jl_field_type_concrete(jl_datatype_t *st JL_PROPAGATES_ROOT, size_t i) JL_NOTSAFEPOINT
          {
              assert(st->types);
              return jl_svecref(st->types, i);
          }
          
          
#define jl_datatype_size(t)    (((jl_datatype_t*)t)->layout->size)
          #define jl_datatype_align(t)   (((jl_datatype_t*)t)->layout->alignment)
          #define jl_datatype_nbits(t)   ((((jl_datatype_t*)t)->layout->size)*8)
          #define jl_datatype_nfields(t) (((jl_datatype_t*)(t))->layout->nfields)
          
          
JL_DLLEXPORT void *jl_symbol_name(jl_sym_t *s);
          // inline version with strong type check to detect typos in a `->name` chain
          STATIC_INLINE char *jl_symbol_name_(jl_sym_t *s) JL_NOTSAFEPOINT
          {
              return (char*)s + LLT_ALIGN(sizeof(jl_sym_t), sizeof(void*));
          }
          #define jl_symbol_name(s) jl_symbol_name_(s)

I’m not sure exactly how this will work out, but I think we should consider doing the experiment in 2023. If we ask LLVM to work with u1, u4, or u12, what code would it generate?

Topic		Replies	Views
Arbitrary Bit Width Integers Internals & Design	15	1428	March 30, 2019
BitIntegers: creation in loop; how to avoid allocations? General Usage	6	213	March 18, 2025
ANN: BitIntegers.jl (Int256, ...) and BitFloats.jl (Float80, Float128) Community package , announcement	8	1709	January 24, 2022
Custom width intergers in Clang/LLVM Numerics	2	637	April 28, 2020
Potential solution to the overflow problem; 64-bit considered harmful, 32- or 21-bit better General Usage integer-overflow	6	3636	October 18, 2021

N-bit Integers in Julia

Related topics