Split an Int128 to two Int64

Edit: this post started as a question and as a I found an answer turned into a suggestion to update the documentation

I have to multiply two Int64 values and split the result into the high and low 64 bit values.
This is the functionality available by the _mulx_u64 intrinsinc function.
I could probably use the llvm code produced by a C program that calls the intrinsic and embed it in an llvmcall, but that seems excessive for this task.

If I look at the llvm code generated by the c program, I see a simple instruction to split the 128 bit integer:

%21 = trunc i128 %20 to i64

taking the high bits could be accomplished with UInt64(v >> 64), but the generated llvm code contains excessive type checking code.
taking the low bits could be done with UInt64(v & 0xffffffffffffffff), but that too produces many type checking commands.

The answer turned up in a book: Julia High Performance, in a section labeled unchecked conversions for unsigned integers.
Instead of converting T(v) use v % T which doesn’t add any type checking
v_l, v_h = v % UInt64, (v >> 64) % UInt64
Testing the llvm, this indeed generated the optimized code, and the run-times are reduced by almost 50%

I searched the official docs and could not find this mentioned.
I suggest the docs be updated to add this in the main section on integers and conversions.

1 Like

It is a nice trick, but I am not sure the manual is the best place for specific micro-optimizations.

Readers interested in these techniques should just buy the Julia High Performance book.

It is document, Mathematics · The Julia Language. What is missing is the mentioning of no-range-check and a cross link to convert and maybe constructor. For the online doc, a link from the rem doc in additional to the mod doc could also be useful.

1 Like