Note: you might want to change the “solution” to using %UInt8, as I’d suggested below, it makes a rather large difference in the generated code (tested on master built today):
julia> splitit(x) = (x>>>8)%UInt8, x%UInt8
splitit (generic function with 1 method)
julia> splitit(0x1234)
(0x12, 0x34)
julia> @code_native splitit(0x1234)
.section __TEXT,__text,regular,pure_instructions
; Function splitit {
; Location: REPL[1]:1
movl %edi, %eax
shrl $8, %eax
movl %edi, %edx
retq
nopl (%rax,%rax)
;}
julia> split2(x) = UInt8(x>>8), UInt8(x & 0xff)
split2 (generic function with 1 method)
julia> @code_native split2(0x1234)
.section __TEXT,__text,regular,pure_instructions
; Function split2 {
; Location: REPL[4]:1
pushq %rbx
movl %edi, %ebx
movabsq $jl_get_ptls_states_fast, %rax
callq *%rax
movabsq $jl_gc_pool_alloc, %rcx
movl $1376, %esi ## imm = 0x560
movl $16, %edx
movq %rax, %rdi
callq *%rcx
movabsq $jl_system_image_data, %rcx
movq %rcx, -8(%rax)
movbew %bx, (%rax)
popq %rbx
retq
nopl (%rax)
;}
I am rather concerned by the huge amount of code produced for a trivial function, simply by using UInt8(x)
instead of x%UInt8
, I wonder if this is a regression on master?
Update: this is definitely a serious regression on master: on v0.6.1, both functions generate identical LLVM IR and (not as optimized) native code.
julia> @code_llvm split2(b)
define [2 x i8] @julia_split2_62803(i16) #0 !dbg !5 {
pass2:
%1 = lshr i16 %0, 8
%2 = trunc i16 %1 to i8
%3 = trunc i16 %0 to i8
%4 = insertvalue [2 x i8] undef, i8 %2, 0
%5 = insertvalue [2 x i8] %4, i8 %3, 1
ret [2 x i8] %5
}
julia> @code_native split2(b)
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[1]
pushq %rbp
movq %rsp, %rbp
Source line: 1
movl %edi, %eax
shrl $8, %eax
movl %edi, %edx
popq %rbp
retq
nopl (%rax)