Base.literal_pow
allows x^2
and x^3
to be turned into x*x
and x*x*x
, turning it into very efficient machine code. I was wondering if it could be extended to higher exponents. The function I used for this is this one:
Base.literal_pow(::typeof(^),x::Number,n::Val{N}) where N = prod(ntuple(Returns(x),Val(N)))
I found that for exponents up to x^32
, the compiled versions are faster, but the compile time gets longer and longer:
Code used to generate the data
using BenchmarkTools, ProgressMeter
#Base.literal_pow(::typeof(^),x::Number,n::Val{N}) where N = prod(ntuple(Returns(x),Val(N)))
t1 = Float64[]
t2 = Float64[]
@showprogress for n in 0:50
@eval f(x) = x^$n
a = 2
push!(t1, @elapsed f(2))
push!(t2, @belapsed f($(Ref(a))[]))
end
My versioninfo()
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
up to x^26
, the compiled code is very efficient:
julia> @code_llvm (x->x^26)(2)
; @ REPL[9]:1 within `#23`
; Function Attrs: uwtable
define i64 @"julia_#23_1634"(i64 signext %0) #0 {
top:
; ┌ @ C:\Users\mittel\juliastuff\literal_pow.jl:3 within `literal_pow`
; │┌ @ tuple.jl:499 within `prod`
; ││┌ @ operators.jl:655 within `*`
; │││┌ @ operators.jl:634 within `afoldl`
; ││││┌ @ int.jl:88 within `*`
%1 = mul i64 %0, %0
%2 = mul i64 %1, %0
%3 = mul i64 %2, %2
%4 = mul i64 %3, %3
%5 = mul i64 %4, %0
%6 = mul i64 %5, %5
; └└└└└
ret i64 %6
}
between x^27
and x^32
the compiled code is more complicated but apparently it is still ok, but after x^33
it’s catastrophical.
Would it make sense to extend this? (up to 5? 10? 26?)
Also I don’t know how it behaves on other architectures or with other types (Float64, Float32, Int32…)
edit: corrected benchmark