Edit: this post started as a question and as a I found an answer turned into a suggestion to update the documentation
I have to multiply two Int64 values and split the result into the high and low 64 bit values.
This is the functionality available by the _mulx_u64 intrinsinc function.
I could probably use the llvm code produced by a C program that calls the intrinsic and embed it in an llvmcall, but that seems excessive for this task.
If I look at the llvm code generated by the c program, I see a simple instruction to split the 128 bit integer:
%21 = trunc i128 %20 to i64
taking the high bits could be accomplished with UInt64(v >> 64)
, but the generated llvm code contains excessive type checking code.
taking the low bits could be done with UInt64(v & 0xffffffffffffffff)
, but that too produces many type checking commands.
The answer turned up in a book: Julia High Performance, in a section labeled unchecked conversions for unsigned integers.
Instead of converting T(v)
use v % T
which doesn’t add any type checking
v_l, v_h = v % UInt64, (v >> 64) % UInt64
Testing the llvm, this indeed generated the optimized code, and the run-times are reduced by almost 50%
I searched the official docs and could not find this mentioned.
I suggest the docs be updated to add this in the main section on integers and conversions.