Convert UInt16 to two UInt8


#1

Could anybody help to figure out this seemingly simple problem:

# UInt16 number
a = 0xabcd
# imaginary function split()
b, c = split(a)
# b = 0xab
# c = 0xcd

I’ve tried to use reinterpret with bits shifting, but got

bitcast: argument size does not match size of target typ

Doing something like

UInt8(a)

yields InexactError. How do I need to write this function?

split(a::UInt16)::Tuple{UInt8, UInt8} = ...

#2
UInt8(a >> 8)
UInt8(a & 0xff)

#3

Excellent, so if every other bit is 0, no InexactError. Is it possible to do the same using some low-level functions, to remove the checking for 0 overhead?


#4

On 0.7 this works:

julia> function test(x::UInt16)
       unsafe_load(Ptr{NTuple{2, UInt8}}(Base.unsafe_convert(Ptr{UInt16}, Ref(x))))
       end
test (generic function with 3 methods)

julia> test(UInt16(2))
(0x02, 0x00)

julia> @btime test(UInt16(2))
  1.263 ns (0 allocations: 0 bytes)
(0x02, 0x00)

A bit ugly for my taste, since reinterpret doesn’t work on Ref… Maybe I miss some function that works on Ref which would make this nicer.


#5

I think that the following may work better:

(a>>>8)%UInt8
a%UInt8

#6

Explanation: you should only take a pointer from a ref in Julia code, but on 0.6 this allocates the Ref. Since the ref doesn’t escape the function and escape analysis got a lot better on 0.7, this generates pretty optimal code on 0.7, though :slight_smile:


#7

Not sure what checking you are talking about:

julia> f(a) = UInt8(0xff & a)
f (generic function with 1 method)

julia> @code_llvm f(0xabcd)

define i8 @julia_f_63099(i16) #0 !dbg !5 {
pass:
  %1 = trunc i16 %0 to i8
  ret i8 %1
}
julia> g(a) = UInt8(a >> 8)
g (generic function with 1 method)

julia> @code_llvm g(0xabcd)

define i8 @julia_g_63104(i16) #0 !dbg !5 {
pass:
  %1 = lshr i16 %0, 8
  %2 = trunc i16 %1 to i8
  ret i8 %2
}

#8

This code is not well defined.


#9

Do I understand this right: so the compiler matches the & mask pattern, looks at the value of constant 0x00ff (I’ve tried it with UInt16 mask), makes inference at compile time that the leading are 0s, and reduces it to trunc LLVM command? This looks like magic, but this is actually true (?). For 0x01ff mask it produces completely another code. Wow, thanks for showing me this snippet.


#10

Yes, the inference of value range/bit pattern is a pretty standard low level optimization, the kind of optimization that LLVM is really good at.


#11

Also, if you don’t want those to start with use v % UInt8 instead.


#12

Alternate solution:

split(a) = (reinterpret(UInt8, [a])...)

c, b = split(0xabcd)
# Yields (0xcd, 0xab)

Note that c is output first; you could swap if you like.

I’m not sure why we need an array for the reinterpret call to work. There are a bunch of arguments in some thread somewhere, but by the end, there doesn’t seem to be any consensus about whether reinterpret should work on a scalar.


#13

Note: you might want to change the “solution” to using %UInt8, as I’d suggested below, it makes a rather large difference in the generated code (tested on master built today):

julia> splitit(x) = (x>>>8)%UInt8, x%UInt8
splitit (generic function with 1 method)

julia> splitit(0x1234)
(0x12, 0x34)

julia> @code_native splitit(0x1234)
.section __TEXT,__text,regular,pure_instructions
; Function splitit {
; Location: REPL[1]:1
movl %edi, %eax
shrl $8, %eax
movl %edi, %edx
retq
nopl (%rax,%rax)
;}

julia> split2(x) = UInt8(x>>8), UInt8(x & 0xff)
split2 (generic function with 1 method)

julia> @code_native split2(0x1234)
.section __TEXT,__text,regular,pure_instructions
; Function split2 {
; Location: REPL[4]:1
pushq %rbx
movl %edi, %ebx
movabsq $jl_get_ptls_states_fast, %rax
callq *%rax
movabsq $jl_gc_pool_alloc, %rcx
movl $1376, %esi ## imm = 0x560
movl $16, %edx
movq %rax, %rdi
callq *%rcx
movabsq $jl_system_image_data, %rcx
movq %rcx, -8(%rax)
movbew %bx, (%rax)
popq %rbx
retq
nopl (%rax)
;}

I am rather concerned by the huge amount of code produced for a trivial function, simply by using UInt8(x) instead of x%UInt8, I wonder if this is a regression on master?

Update: this is definitely a serious regression on master: on v0.6.1, both functions generate identical LLVM IR and (not as optimized) native code.

julia> @code_llvm split2(b)

define [2 x i8] @julia_split2_62803(i16) #0 !dbg !5 {
pass2:
%1 = lshr i16 %0, 8
%2 = trunc i16 %1 to i8
%3 = trunc i16 %0 to i8
%4 = insertvalue [2 x i8] undef, i8 %2, 0
%5 = insertvalue [2 x i8] %4, i8 %3, 1
ret [2 x i8] %5
}

julia> @code_native split2(b)
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[1]
pushq %rbp
movq %rsp, %rbp
Source line: 1
movl %edi, %eax
shrl $8, %eax
movl %edi, %edx
popq %rbp
retq
nopl (%rax)


#14

IIUC that behavior is a bug @code_llvm and should be fixed by https://github.com/JuliaLang/julia/pull/24642/files


#15

Thanks very much! I’ll retest as soon as that’s merged then. This is the sort of low-level bit twiddling that I do in a lot of my code, so I was rather nervous! I also am very happy to see that the push %ebp ; mov %esp, %ebp ; ... ; pop %ebp stuff has finally been eliminated on master! :slight_smile:


#16

It now looks fine on master, with that bug fix merged, thanks for the pointer to what was going on!