Right now you’re benchmarking with the constant 56. If it’s truly a constant, then that’ll probably always be equal or fastest (because LLVM does a similar transform on my platform!). Look ma, no divs:
julia> f(x) = div(x, 56)
f (generic function with 1 method)
julia> @code_native debuginfo=:none f(123123)
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 15, 0
.globl _julia_f_10054 ; -- Begin function julia_f_10054
.p2align 2
_julia_f_10054: ; @julia_f_10054
; Function Signature: f(Int64)
; %bb.0: ; %pass
;DEBUG_VALUE: f:x <- $x0
;DEBUG_VALUE: f:x <- $x0
mov x8, #18725
movk x8, #9362, lsl #16
movk x8, #37449, lsl #32
movk x8, #18724, lsl #48
smulh x8, x0, x8
asr x9, x8, #4
add x0, x9, x8, lsr #63
ret
; -- End function
.subsections_via_symbols
You’d want to benchmark it without the constant propagation:
julia> @btime div.($un, 56); # constant!
2.718 μs (3 allocations: 96.06 KiB)
julia> @btime div.($un, $(Ref(56))[]);
6.267 μs (3 allocations: 96.06 KiB)
julia> @btime div.($un, $(Ref(MultiplicativeInverses.multiplicativeinverse(MultiplicativeInverses.unsigned(56))))[]);
3.068 μs (3 allocations: 96.06 KiB)