My current conclusion is that we need to find three bits from somewhere to represent the number of unused bits (0 - 7). My guess is that we could take those from the padding field of jl_datatype_layout_t.

I am confused, what is the advantage of implementing it in Julia instead of BitIntegers.jl? Introducing arbitrary primitive type to the language is certainly misleading if they are not fully-supported.

The only advantage seems to be saving a few llvmcalls, which does not seem to be a large amount of work comparing to shaking the type system of Julia.

BitIntegers.jl and julia are currently bound to defining primitive types that are a multiple of a byte. BitIntegers.jl just creates a new primitive. Trying to create a primitive that is not a mulitple of a byte results in an error.

julia> primitive type UInt4 <: Unsigned 4 end
ERROR: invalid number of bits in primitive type UInt4
Stacktrace:
[1] top-level scope
@ REPL[129]:1
julia> primitive type UInt12 <: Unsigned 12 end
ERROR: invalid number of bits in primitive type UInt12
Stacktrace:
[1] top-level scope
@ REPL[130]:1
julia> BitIntegers.@define_integers 12
ERROR: invalid number of bits in primitive type Int12
Stacktrace:
[1] top-level scope
@ C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:60
julia> @macroexpand BitIntegers.@define_integers 12
quote
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:60 =#
primitive type Int12 <: BitIntegers.AbstractBitSigned 12 end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:61 =#
primitive type UInt12 <: BitIntegers.AbstractBitUnsigned 12 end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:63 =#
(BitIntegers.Base).Signed(var"#208#x"::UInt12) = begin
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:63 =#
Int12(var"#208#x")
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:64 =#
(BitIntegers.Base).Unsigned(var"#209#x"::Int12) = begin
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:64 =#
UInt12(var"#209#x")
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:65 =#
(BitIntegers.Base).uinttype(::BitIntegers.Type{Int12}) = begin
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:65 =#
UInt12
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:66 =#
(BitIntegers.Base).uinttype(::BitIntegers.Type{UInt12}) = begin
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:66 =#
UInt12
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:68 =#
(BitIntegers.Base).widen(::BitIntegers.Type{Int12}) = begin
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:68 =#
Int24
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:69 =#
(BitIntegers.Base).widen(::BitIntegers.Type{UInt12}) = begin
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:69 =#
UInt24
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:71 =#
macro int12_str(var"#214#s")
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:71 =#
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:72 =#
return BitIntegers.parse(Int12, var"#214#s")
end
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:74 =#
macro uint12_str(var"#215#s")
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:74 =#
#= C:\Users\kittisopikulm\.julia\packages\BitIntegers\6M5fx\src\BitIntegers.jl:75 =#
return BitIntegers.parse(UInt12, var"#215#s")
end
end

Only things that have to be in the language are added to the language. The easiness of that is not a reason to add it, for example supporting a piping syntax to Julia is easy enough, but it becomes very, very contentious when one wants it in Julia itself.

From #35526, primitive type is considered leaky implementation detail, thus discouraged to be used by language users. Its existence is only justified by the limitation of current Julia implementation. Arbitrary integer types, on the other hand, does not have any problem residing in a package. Supporting it in Julia will incur a lot of discussions: what will be its representation, what about its alignment, will it save space when stored in an array, how do we infer arbitrary but limited precision integers?

How would one create a 12-bit integer in a package? As I demonstrated above, BitIntegers.jl does not allow one to define a 12-bit integer because Julia does not allow one to define a 12-bit primitive. I made an atttempt, but defining UInt12 was difficult enough that I just defaulted the element type to be UInt16.

Are u1 and i1 really still buggy in LLVM? How about u12 and i12?

My guess is that the situation has improved as N-bit integers are now part of the draft C23 standard. I think we can reference that to answer some of your questions.

How would one create a 12-bit integer in a package? As I demonstrated above, BitIntegers.jl does not allow one to define a 12-bit integer because Julia does not allow one to define a 12-bit primitive. I made an atttempt, but defining UInt12 was difficult enough that I just defaulted the element type to be UInt16.

I still can’t see the difference between the 12-bit integer type you mean and UInt16 with a wrapper that truncates or extends the number. In N2763, they define arbitrary precision integers be represented by the smallest possible power-of-2 digit integer, so UInt16 is the right choice.

You can pack two 12-bit integers into 3 bytes (since 3*8 = 24). But it takes 4 bytes if you represent them as UInt16s with 4 unused bits each.

I don’t want to speak for @mkitti because I’m not sure of the intended application, but I wouldn’t be surprised if this involves data from scientific cameras or data acquisition cards. It’s a nontrivial issue because modern instrumentation can produce data at rates exceeding 1GB/s (≈ 4TB/hr), and there are pipelines that may really notice a “useless” 33% increase in data volume.

At the same time, you want this to work seamlessly enough to make it trivial to exploit such arrays without big costs elsewhere. Otherwise, you may be better off just accepting the 33% increase. That’s what I’ve typically chosen to do, but I would be grateful to see a nice solution for this issue.

Random reads and writes might be a problem. But most modern compression algorithms can process a compressed stream at nearly memcpy speed and they have to deal with variable arbitrary lengths.

The primary goal here is to describe bit-precise integers in a way that is not dependent on knowing the implementation details of the underlying processor architecture.

A secondary goal that has been mentioned is packing these bit-precise integers into arrays and perhaps unpacking them.

Meanwhile, LLVM knows perfetly well what a u12 is and can compile efficient code to use that type. If this is also being used by C23, then I suspect these compilation paths will be well tested in the future. Instead of fighting our compiler, let’s use it.

LLVM denominates things in terms of bits. However, we always provide with multiples of 8.

I’m not sure exactly how this will work out, but I think we should consider doing the experiment in 2023. If we ask LLVM to work with u1, u4, or u12, what code would it generate?