Exponentiation of floats

richard_sti · November 8, 2023, 2:18pm

Hi, I noticed that there is a significant loss of performance between Julia 1.7.3 and 1.9.x for exponentiation of floats, which becomes ~ 5 times slower. I tested this independently on a few machines. A minimal example of this is attached below.

using BenchmarkTools

function test()
    x = 1.
    for i in 1:100_000_000
        x += x^2.
    end
    return x
end

@btime test()

The problem remains in the available version of Julia 1.10.x, which is in fact 2 times slower than 1.9.x.

Thanks!

mbauman · November 8, 2023, 2:30pm

This is aside from the regression, but do you really need to exponentiate by a float? I see 6x faster performance with x += x^2 (without the .).

jling · November 8, 2023, 2:36pm

idk if OP needs it but some fields need it:
https://cds.cern.ch/record/2866130/files/ANA-EXOT-2022-18-PAPER.pdf#page=9

richard_sti · November 8, 2023, 2:38pm

Yes, I am aware that if one uses an integer it is much faster. However, I do require exponentiating by floats in general and this is a very significant difference between older and newer version of Julia.

jling · November 8, 2023, 2:41pm

how fast was the older version? I have opened an issue: Exponential by a float is very slow · Issue #52079 · JuliaLang/julia · GitHub

richard_sti · November 8, 2023, 2:42pm

That minimal example on 1.7.3 is ~ 220 ms whereas on 1.9.3 it is ~ 900 ms and on 1.10.x 2100 ms.

jling · November 8, 2023, 2:44pm

for me it’s same speed on 1.7:

julia> @benchmark x^y setup=begin x=rand(); y=rand()end
BenchmarkTools.Trial: 10000 samples with 989 evaluations.
 Range (min … max):  46.524 ns … 126.397 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     51.524 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   54.200 ns ±   8.123 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █           ▁▅                        ▄▇               ▂▄    ▂
  █▃▁▇██▇█▇▇▇█████▆▅▅▆▃▄▄▁▅▁▄▃▃▄▃▄▃▁▁▃▃▁██▆▄▅▇▇▇▇▇▇▇▇▆▅▇▇██▆▆▇ █
  46.5 ns       Histogram: log(frequency) by time      69.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Zentrik · November 8, 2023, 2:46pm

For what it’s worth in this example, the 2. is getting converted to 2 inside the power function.

github.com

JuliaLang/julia/blob/bc55f44e17e9d1d72c9c9a87a1a60cf4bb5a17ad/base/math.jl#L1120


      
          # variants for callers with constant `y`
          @constprop :aggressive function ^(x::Float64, y::Float64)
              xu = reinterpret(UInt64, x)
              xu == reinterpret(UInt64, 1.0) && return 1.0
              # Exponents greater than this will always overflow or underflow.
              # Note that NaN can pass through this, but that will end up fine.
              if !(abs(y)<0x1.8p62)
                  isnan(y) && return y
                  y = sign(y)*0x1.8p62
              end
              yint = unsafe_trunc(Int64, y) # This is actually safe since julia freezes the result
              y == yint && return @noinline x^yint
              2*xu==0 && return abs(y)*Inf*(!(y>0)) # if x==0
              x<0 && throw_exp_domainerror(x) # |y| is small enough that y isn't an integer
              !isfinite(x) && return x*(y>0 || isnan(x))           # x is inf or NaN
              if xu < (UInt64(1)<<52) # x is subnormal
                  xu = reinterpret(UInt64, x * 0x1p52) # normalize x
                  xu &= ~sign_mask(Float64)
                  xu -= UInt64(52) << 52 # mess with the exponent
              end
              return pow_body(xu, y)

In the llvm IR of test on 1.9.3, same thing occurs on 1.10-rc1

%1 = call double @j_pow_body_301(double %value_phi1, i64 signext 2) #0

Zentrik · November 8, 2023, 2:50pm

I see a 2x regression from 1.9.3 to 1.10-rc1. The @noinline x^yint doesn’t inline/ const prop as well as it does on 1.9.3 which could be the problem.

EDIT: Manually bypassed the call in 1.10 to ^ and inserted a pow_body call to match 1.9.3 but didn’t make a difference. Likely the regression is in pow_body as otherwise the LLVM IR seems to be identical.

Oscar_Smith · November 8, 2023, 3:07pm

Do you happen to be running this on a fairly old CPU? For cpus with FMA (haswell/bulldozer or newer) floating point powers should have gotten a lot faster.

Oscar_Smith · November 8, 2023, 3:09pm

Note that this is a pretty bad benchmark since 1^x is a special case so you are bench-marking the speed of a specific case rather than floating point exponentiation in general.

Zentrik · November 8, 2023, 3:11pm

I don’t think so,

julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × AMD Ryzen 7 5700U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 16 virtual cores

I’m fairly confused though as far as I can tell there’s been no change to the code between 1.9.3 and 1.10-rc1 and the LLVM IR is near identical. Especially as ^2 is the same speed on both versions.

Zentrik · November 8, 2023, 3:12pm

The function is adding to x so it’s only 1^2. on the first iteration.

Oscar_Smith · November 8, 2023, 3:14pm

Oh, right. It’s still a bad benchmark, but for a different reason After the first 12 iterations, it’s computing Inf^2 which is also a special case.

Oscar_Smith · November 8, 2023, 3:27pm

You’re right that something is wrong with 1.10 (not sure what yet).
A better version of this benchmark would be

function test()
    x = 1.1
    for i in 1:100_000_000
        x = x^2.0
        x > 1e160 && (x = 1.1)
    end
    return x
end

Which gives me ~740 ms on 1.9 and nightly, but 1.3s on 1.10. This is very odd since we didn’t change anything here between 1.10 and nightly as far as I know.

richard_sti · November 8, 2023, 3:36pm

Yep, this is a better example but the problem is still there. What speed do you get for 1.7.3? In my tests that one is still faster than 1.9.

Oscar_Smith · November 8, 2023, 3:58pm

I can confirm this. That said, this seems to be a regression purely for the case of raising a number to the 2.0 power. If I change the power to 2.1 I get 9.6 seconds on 1.7, vs 3 seconds on 1.9. Also for a power of 3.0 I get a time of 390ms for 1.7 vs 285 for 1.9

Zentrik · November 8, 2023, 4:03pm

I see a regression from 1.9 to 1.10 for 2.0, 3.0 (197ms to 320ms) and 4.0 (817ms to 1.6s) and 4.1 (3.5s to 4.5s) for your example.

Oscar_Smith · November 8, 2023, 4:03pm

yeah. the 1.9 to 1.10 issue is real (and very weird)

Zentrik · November 8, 2023, 4:04pm

github.com

JuliaLang/julia/blob/5aaa94854367ca875375e38ae14f369f124e7315/base/math.jl#L54-L60


      
          @assume_effects :consistent @inline function two_mul(x::Float64, y::Float64)
              if Core.Intrinsics.have_fma(Float64)
                  xy = x*y
                  return xy, fma(x, y, -xy)
              end
              return Base.twomul(x,y)
          end

My profile of 1.10-rc1 unlike in 1.9.3 seems to suggest that the non-fma path for two_mul is being called.

1.10-rc1

1.9.3

Weirdly LLVM IR says we’re using fma path

julia> @code_llvm Base.Math.two_mul(1., 2.)
;  @ math.jl:54 within `two_mul`
; Function Attrs: uwtable
define void @julia_two_mul_451([2 x double]* noalias nocapture noundef nonnull sret([2 x double]) align 8 dereferenceable(16) %0, double %1, double %2) #0 {
common.ret:
;  @ math.jl:56 within `two_mul`
; ┌ @ float.jl:411 within `*`
   %3 = fmul double %1, %2
; └
;  @ math.jl:57 within `two_mul`
; ┌ @ float.jl:407 within `-`
   %4 = fneg double %3
; └
; ┌ @ floatfuncs.jl:439 within `fma`
; │┌ @ floatfuncs.jl:434 within `fma_llvm`
    %5 = call double @llvm.fma.f64(double %1, double %2, double %4)
; └└
  %newstruct.sroa.0.0..sroa_idx = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
  store double %3, double* %newstruct.sroa.0.0..sroa_idx, align 8
  %newstruct.sroa.2.0..sroa_idx8 = getelementptr inbounds [2 x double], [2 x
 double]* %0, i64 0, i64 1
  store double %5, double* %newstruct.sroa.2.0..sroa_idx8, align 8
;  @ math.jl within `two_mul`
  ret void
}

Topic		Replies	Views
Slow arbitrary base exponentiation, a^b Performance	22	2776	June 14, 2020
Base.runtests("math") failures on chips older than haswell New to Julia	12	692	August 7, 2021
Performance of Float32 exponential Performance	4	1454	December 21, 2019
Comparing exp() performance on Julia versus numpy Performance	38	1928	August 8, 2022
What's going on with exp() and --math-mode=fast? General Usage fast-math	29	4427	October 23, 2021

Exponentiation of floats

Related topics