How to define an efficient `bswap_int` for user defined primitive types?

xiaodai · December 28, 2017, 11:05pm

I am trying to define my own primitive type of bit-length 8*n for some n, e.g. below I have n=3.
The functions lshr_int and shl_int works directly on the newly-defined type but not bswap_int. I tried to check its definition using @which bswap_int(UInt(888)) and I see that it’s an intrinsic function so I can’t look at its implementation for UInt and try to adapt it.

I can build my own bswap_int using lshr_int and shl_int and | but is there a more efficient way?

primitive type  UInt24 <: Unsigned 24 end
x = unsafe_load(Ptr{UInt24}(pointer("abc")))

# bitshifts work fine
Base.lshr_int(x, 8)
Base.shl_int(x, 8)

# this will crash 
Base.bswap_int(x)

The background is that I am trying to build a more efficient string radixsort so being able to load the underlying bits of various length efficiently is key.

jameson · January 4, 2018, 5:07pm

The fastest way to do it is using shifts:
((x & 0xff) << 16) | (x & 0xff00) | ((x & 0xff0000) >> 16)

gdkrmr · January 4, 2018, 8:24pm

I get the following error:

julia> x & 0xff
ERROR: no promotion exists for UInt24 and UInt8
Stacktrace:

xiaodai · January 4, 2018, 8:31pm

Actually if I define a type whose length is a multiple of word size then bitswap_int works fine. E.g.

primitive type Bits192 192 end
tmp = "abcdefedghiklmnopqrstuvwz"
a = unsafe_load(Ptr{Bits192}(pointer(tmp)))
Base.bswap_int(a)

StefanKarpinski · January 5, 2018, 6:34am

I’m kind of unclear on what the benefit of a 24-bit integer type is. It doesn’t save space in registers and doesn’t save storage on disk unless you sacrifice decent alignment altogether which doesn’t seem worth it.

xiaodai · January 5, 2018, 6:51am

it was just an example. i was trying to make a type that can load 3 bytes
at once from a string using unsafe_load. Also SAS has a 24bit numeric type
of for reading sas data this might become helpful.

the other examples i tried were 192 bits type.

StefanKarpinski · January 5, 2018, 6:53am

It’s certainly doable but it would probably be easier to load bytes individually and put them into a standard integer type like you would in C.

ScottPJones · January 5, 2018, 7:02am

A number of people have brought up their use cases for 24-bit numbers (such as representing RGB colors)
Using 33% more space (in memory or on disk) can end up affecting performance more than alignment issues (which generally aren’t even issues on most processors these days).

Better to actually get real evidence rather than stating opinions without the data to back them up.

ScottPJones · January 5, 2018, 7:05am

If you can guarantee that reading one byte past the end is safe (which isn’t that hard to do, by simply allocating a buffer one byte larger than needed), then you can simply do an unaligned 32-bit read and a mask faster than doing loading 3 bytes individually.
Even when you can’t, I’ve seen that LLVM optimizes a 24-bit load or store into two operations, a 16-bit one and an 8-bit one.

Topic		Replies	Views
How to write bits into a user-defined primitive type? General Usage	3	1646	December 29, 2017
What are the consequences of really big primitive types? Internals & Design	1	646	October 29, 2020
Convert to and from Int24 General Usage question	3	1438	December 10, 2017
odd byte length primitive types and reinterpret() General Usage	3	945	February 13, 2018
Failure with `lshr_int` (intrinsic for >>) on UInt256 type General Usage	5	442	October 4, 2020

How to define an efficient `bswap_int` for user defined primitive types?

Related topics