Power function not inlined optimally

Bernard_GODARD · January 4, 2019, 10:12am

The following function:

julia> f(x)=x^5, x^3
f (generic function with 1 method)

is compiled into:

julia> @code_llvm f(2)

; Function f
; Location: REPL[10]:1
define void @julia_f_35717([2 x i64]* noalias nocapture sret, i64) {
top:
; Function literal_pow; {
; Location: none
; Function macro expansion; {
; Location: none
; Function ^; {
; Location: intfuncs.jl:220
  %2 = call i64 @julia_power_by_squaring_22857(i64 %1, i64 5)
;}}}
; Function literal_pow; {
; Location: intfuncs.jl:244
; Function *; {
; Location: operators.jl:502
; Function *; {
; Location: int.jl:54
  %3 = mul i64 %1, %1
  %4 = mul i64 %3, %1
;}}}
  %.sroa.0.0..sroa_idx = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
  store i64 %2, i64* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
  store i64 %4, i64* %.sroa.2.0..sroa_idx1, align 8
  ret void
}

The x^5 is computed by a call to julia_power_by_squaring while x^3 is computed inline by first computing x^2.
Because x^3 and x^2 are already available, it should be better to get x^5 inline with an additional multiplication rather than doing a call to julia_power_by_squaring.

Of course I can fix it by writing the multiplications explicitly:

julia> function f(x)
           x2=x*x
           x3=x2*x
           x5=x3*x2
           x5,x3
        end
f (generic function with 1 method)

julia> @code_llvm f(2)

; Function f
; Location: REPL[12]:2
define void @julia_f_35721([2 x i64]* noalias nocapture sret, i64) {
top:
; Function *; {
; Location: int.jl:54
  %2 = mul i64 %1, %1
;}
; Location: REPL[12]:3
; Function *; {
; Location: int.jl:54
  %3 = mul i64 %2, %1
;}
; Location: REPL[12]:4
; Function *; {
; Location: int.jl:54
  %4 = mul i64 %3, %2
;}
; Location: REPL[12]:5
  %.sroa.0.0..sroa_idx = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 0
  store i64 %4, i64* %.sroa.0.0..sroa_idx, align 8
  %.sroa.2.0..sroa_idx1 = getelementptr inbounds [2 x i64], [2 x i64]* %0, i64 0, i64 1
  store i64 %3, i64* %.sroa.2.0..sroa_idx1, align 8
  ret void
}

but shouldn’t the compiler take care of this?

stevengj · January 4, 2019, 1:11pm

See the discussion at inline ^ for literal powers of numbers by stevengj · Pull Request #20637 · JuliaLang/julia · GitHub

It’s possible in Julia (actually even in user code) to detect literal integer powers like this and inline them to the optimal number of multiplications, and then the compiler would be able to catch common subexpressions like in your case. We opted not to do this for floating-point values because it is slightly less accurate. There is certainly the option of doing this for exponentiating integer values (for which multiplication is associative), although large literal integer powers are probably less useful in practice.

If you have a set of powers like x^6, x^12, x^4, and so on, computing the whole set with an optimal number of multiplications (sharing as many multiplications as possible) corresponds to finding an optimal “addition chain” for the set of exponents, and is a hard (NP-complete) problem in general. (Finding a solution that is only slightly suboptimal is not so hard, however.)

antoine-levitt · January 4, 2019, 1:30pm

If this was not merged because of accuracy concerns, it would seem reasonable to activate it under @fastmath.

Bernard_GODARD · January 4, 2019, 3:58pm

Thank you for your reply.

Could you elaborate a bit on how it is slightly less accurate? or is the problem only that the result would then depend on what intermediate powers were computed in the code because of the non associativity of floating point multiplication.

I can understand that a large number of multiplications could lead to numerical errors greater than what is achievable with the pow function in math.h but x^5 does not need many multiplications.

julia> function power5(x)
          x2=x*x
          x3=x2*x
          x5=x3*x2
       end
power5 (generic function with 1 method)

julia> f(x)=x^5
f (generic function with 1 method)

julia> x=rand(Float64,1000)*1e6;

julia> maximum(abs.(power5.(x)-f.(x))./f.(x)) #relative error
3.793000313887566e-16

stevengj · January 4, 2019, 4:09pm

Your own test answers your question: the power5 method loses up to about two ulps. (For elementary functions like exponentiation we worry about errors this small.)

@antoine-levitt, regarding @fastmath see Make `@fastmath` aware of `Base.literal_pow` by ajkeller34 · Pull Request #21099 · JuliaLang/julia · GitHub where I made a similar comment.

Bernard_GODARD · January 4, 2019, 4:26pm

Thank you. I should have checked better: I thought this was only one ulps. Using julia eps function, it is easier to see:

julia> maximum(abs.(power5.(x)-f.(x))./eps.(f.(x))) #ulps error
2.0

Topic		Replies	Views
Small, fixed powers Performance	5	505	May 28, 2021
Should `literal_pow` optimize for bigger exponents? Internals & Design	15	796	February 10, 2022
Compiler optimizations on integer exponentation Performance	3	402	July 7, 2021
Speed of power "^" Performance	16	996	May 6, 2022
Power function with Integer ( x^4 much slower than xxx*x ) Performance	12	1467	May 18, 2018

Power function not inlined optimally

Related topics