Exponentiation with literals in both base and exponent not removed by compiler in 1.8

albheim · July 6, 2022, 12:02pm

Between 1.7.3 and 1.8.0-rc1 it seems like the compiler stopped calculating exponentiation Float64^Int literals (maybe other to, haven’t checked) if the exponent is larger than 3. This resulted in adding an overhead for variables defined like that.

Here is 1.7.3

julia> f() = 2.0^4
f (generic function with 2 methods)

julia> @code_llvm f()
;  @ REPL[8]:1 within `f`
define double @julia_f_873() #0 {
top:
  ret double 1.600000e+01
}

and 1.8.0-rc1

julia> f() = 2.0^3
f (generic function with 2 methods)

julia> @code_llvm f()
;  @ REPL[12]:1 within `f`
define double @julia_f_1201() #0 {
top:
  ret double 8.000000e+00
}

julia> f() = 2.0^4
f (generic function with 2 methods)

julia> @code_llvm f()
;  @ REPL[14]:1 within `f`
define double @julia_f_1203() #0 {
top:
; ┌ @ intfuncs.jl:326 within `literal_pow`
; │┌ @ math.jl:1037 within `^`
    %0 = call double @j_pow_body_1205(double 2.000000e+00, i64 signext 4) #0
; └└
  ret double %0
}

Checking the source for Float64^Int we see that there has certainly been come changes (seems to mostly be from this PR) between the versions.

If the function is pure and the inputs are literals, could the compiler calculate it? Is there any reason it shouldn’t? Why does it do it with the first and not the second?

Here is 1.7.3

@inline function ^(x::Float64, y::Integer)
    y == -1 && return inv(x)
    y == 0 && return one(x)
    y == 1 && return x
    y == 2 && return x*x
    y == 3 && return x*x*x
    ccall("llvm.pow.f64", llvmcall, Float64, (Float64, Float64), x, Float64(y))
end

and 1.8.0-rc1

@constprop :aggressive @inline function ^(x::Float64, n::Integer)
    n == 0 && return one(x)
    return pow_body(x, n)
end
@assume_effects :terminates_locally @noinline function pow_body(x::Float64, n::Integer)
    y = 1.0
    xnlo = ynlo = 0.0
    n == 3 && return x*x*x # keep compatibility with literal_pow
    if n < 0
        rx = inv(x)
        n==-2 && return rx*rx #keep compatability with literal_pow
        isfinite(x) && (xnlo = -fma(x, rx, -1.) * rx)
        x = rx
        n = -n
    end
    while n > 1
        if n&1 > 0
            err = muladd(y, xnlo, x*ynlo)
            y, ynlo = two_mul(x,y)
            ynlo += err
        end
        err = x*2*xnlo
        x, xnlo = two_mul(x, x)
        xnlo += err
        n >>>= 1
    end
    !isfinite(x) && return x*y
    return muladd(x, y, muladd(y, xnlo, x*ynlo))
end

Liozou · July 6, 2022, 1:20pm

This comes from the @noinline annotation in the definition of pow_body: replace it with @inline and it works as you would expect.
The decision to mark this as @noinline seems motivated by a reduction in latency when compiling code with ^: see PR #42966 which is relevant I think and PR #43920 which introduced this annotation here. If you notice a significant reduction in performance in your code because of this you should probably open an issue to discuss this decision further.

albheim · July 6, 2022, 3:15pm

Hmm, it only affects me because I wanted to define some constant variables in a function for later use, and it is nicer to declare them as 2.0^34 than 1.7179869184e10 for readability.

My case that made me notice this was a few definitions in a quite small function, and there it was certainly noticable.

Here is a very simple example where it is 5 times slower in 1.8. I’m not sure this means it is worth to revert this since I could easily precalculate things, so faster for dynamic cases where that can’t be done should probably be prioritized. But I feel like more people than me might run into this, and it feels like something that should be possible to optimize.

1.7.3

julia> function f(x)
       A = 2.0^34
       A * x
       end
f (generic function with 1 method)

julia> @code_llvm f(2.0)
;  @ REPL[1]:1 within `f`
define double @julia_f_400(double %0) #0 {
top:
;  @ REPL[1]:3 within `f`
; ┌ @ float.jl:405 within `*`
   %1 = fmul double %0, 0x4210000000000000
; └
  ret double %1
}

julia> using BenchmarkTools

julia> @benchmark f(2.0)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.637 ns … 15.510 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.816 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.848 ns ±  0.309 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █
  █▄▂▁▁▁▆▆▃▂▁▁▁▇▅▂▂▁▁▁▄▆▃▂▂▁▁▁▆▅▃▂▁▁▁▁▃▄▃▂▁▁▁▁▁▂▆▄▃▂▂▂▁▁▁▃▇▄ ▃
  1.64 ns        Histogram: frequency by time        2.14 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

and 1.8.0-rc1

julia> function f(x)
       A = 2.0^34
       A * x
       end
f (generic function with 1 method)

julia> @code_llvm f(2.0)
;  @ REPL[1]:1 within `f`
define double @julia_f_674(double %0) #0 {
top:
;  @ REPL[1]:2 within `f`
; ┌ @ intfuncs.jl:326 within `literal_pow`
; │┌ @ math.jl:1037 within `^`
    %1 = call double @j_pow_body_676(double 2.000000e+00, i64 signext 34) #0
; └└
;  @ REPL[1]:3 within `f`
; ┌ @ float.jl:385 within `*`
   %2 = fmul double %1, %0
; └
  ret double %2
}

julia> using BenchmarkTools

julia> @benchmark f(2.0)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  8.574 ns … 25.060 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.159 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.619 ns ±  1.237 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇       █
  █▄▃▂█▃▃▂█▅▃▂▅█▂▃▂▃▄▂▂▁▃▃▂▂▂▂▃▂▂▂▂▂▃▂▂▂▂▂▃▂▂▂▂▂▂▃▂▂▂▂▂▂▅▅▂▃ ▃
  8.57 ns        Histogram: frequency by time        12.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Liozou · July 6, 2022, 3:24pm

For these use cases you should probably move the constant declaration out of the function altogether and mark it as const:

const A = 2.0^34
f(x) = A * x

so that it’s not computed at each call of f

albheim · July 6, 2022, 3:31pm

I feel like if I have a constant only used in this function it is strange to keep it global and clutter that namespace?

Liozou · July 6, 2022, 3:35pm

In that case, you can declare a new module that acts as a namespace for your constants:

module MyConstants
   const A = 2.0^34
   const B = ...
end

f(x) = MyConstants.A * x

I agree that it would be nice to force the compiler to compute a constant local variable to bypass all this, but I don’t think there is a mechanism for that yet.

mikmoore · July 6, 2022, 3:38pm

This seems like the sort of thing the new effects modeling machinery is supposed to fix? Maybe??? But you show that v1.8 (with effects modeling) does in-fact fail here.

I’d suggest you open an issue. If your use case is performance-sensitive to this then try hardcoding it to the literal value or setting it to a constant, as others have suggested. If it does not affect your performance meaningfully, I’d write it your preferred way and hope that an eventual fix allows constant propagation to handle it properly.

albheim · July 6, 2022, 3:40pm

Yeah, my solution for now was just to do

function f(x)
    A = 1.7179869184e10 # 2.0^34
    A * x
end

which is acceptable to me, though it would have been nice especially since it worked in 1.7.

Oscar_Smith · July 6, 2022, 3:45pm

This definitely should constant propagate. That said, you can write the literal as 0x1p34 which may be better than writing the decimal version.

Liozou · July 6, 2022, 3:58pm

Note that it would const-propagate (despite the @noinline) if the method was :consistent (adding the annotation makes it work), but that’s forbidden by the fma apparently.

albheim · July 6, 2022, 3:59pm

True, that is actually what the current solution looked like (after you told me to in this PR).

Just thought that this syntax might not be what most people reach for (at least I had never seen it before) so there might be others trying the same thing as me.

Added an issue so it can be tracked on git now.

Topic		Replies	Views
Should `literal_pow` optimize for bigger exponents? Internals & Design	15	796	February 10, 2022
How to force compile-time evaluation of literal calculations? Performance	4	824	March 9, 2021
No constant expression elimination for e.g. '2^24-1' Performance question	7	886	November 12, 2019
Why isn't 10^6 evaluated at compile time? Performance	21	1128	May 25, 2020
Promotion and literals Internals & Design	10	944	February 8, 2018

Exponentiation with literals in both base and exponent not removed by compiler in 1.8

Related topics