# Exponentiation of floats

Hi, I noticed that there is a significant loss of performance between Julia 1.7.3 and 1.9.x for exponentiation of floats, which becomes ~ 5 times slower. I tested this independently on a few machines. A minimal example of this is attached below.

``````using BenchmarkTools

function test()
x = 1.
for i in 1:100_000_000
x += x^2.
end
return x
end

@btime test()
``````

The problem remains in the available version of Julia 1.10.x, which is in fact 2 times slower than 1.9.x.

Thanks!

2 Likes

This is aside from the regression, but do you really need to exponentiate by a float? I see 6x faster performance with `x += x^2` (without the `.`).

1 Like

idk if OP needs it but some fields need it:
https://cds.cern.ch/record/2866130/files/ANA-EXOT-2022-18-PAPER.pdf#page=9

1 Like

Yes, I am aware that if one uses an integer it is much faster. However, I do require exponentiating by floats in general and this is a very significant difference between older and newer version of Julia.

2 Likes

how fast was the older version? I have opened an issue: Exponential by a float is very slow Β· Issue #52079 Β· JuliaLang/julia Β· GitHub

That minimal example on 1.7.3 is ~ 220 ms whereas on 1.9.3 it is ~ 900 ms and on 1.10.x 2100 ms.

for me itβs same speed on 1.7:

``````julia> @benchmark x^y setup=begin x=rand(); y=rand()end
BenchmarkTools.Trial: 10000 samples with 989 evaluations.
Range (min β¦ max):  46.524 ns β¦ 126.397 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     51.524 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   54.200 ns Β±   8.123 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β           ββ                        ββ               ββ    β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
46.5 ns       Histogram: log(frequency) by time      69.4 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

For what itβs worth in this example, the `2.` is getting converted to `2` inside the power function.

In the llvm IR of `test` on 1.9.3, same thing occurs on 1.10-rc1

``````%1 = call double @j_pow_body_301(double %value_phi1, i64 signext 2) #0
``````
2 Likes

I see a 2x regression from 1.9.3 to 1.10-rc1. The `@noinline x^yint` doesnβt inline/ const prop as well as it does on 1.9.3 which could be the problem.

EDIT: Manually bypassed the call in 1.10 to `^` and inserted a `pow_body` call to match 1.9.3 but didnβt make a difference. Likely the regression is in `pow_body` as otherwise the LLVM IR seems to be identical.

1 Like

Do you happen to be running this on a fairly old CPU? For cpus with FMA (haswell/bulldozer or newer) floating point powers should have gotten a lot faster.

2 Likes

Note that this is a pretty bad benchmark since `1^x` is a special case so you are bench-marking the speed of a specific case rather than floating point exponentiation in general.

1 Like

I donβt think so,

``````julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 Γ AMD Ryzen 7 5700U with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
Threads: 1 on 16 virtual cores
``````

Iβm fairly confused though as far as I can tell thereβs been no change to the code between 1.9.3 and 1.10-rc1 and the LLVM IR is near identical. Especially as `^2` is the same speed on both versions.

The function is adding to `x` so itβs only `1^2.` on the first iteration.

Oh, right. Itβs still a bad benchmark, but for a different reason After the first 12 iterations, itβs computing `Inf^2` which is also a special case.

4 Likes

Youβre right that something is wrong with 1.10 (not sure what yet).
A better version of this benchmark would be

``````function test()
x = 1.1
for i in 1:100_000_000
x = x^2.0
x > 1e160 && (x = 1.1)
end
return x
end
``````

Which gives me ~740 ms on 1.9 and nightly, but 1.3s on 1.10. This is very odd since we didnβt change anything here between 1.10 and nightly as far as I know.

1 Like

Yep, this is a better example but the problem is still there. What speed do you get for 1.7.3? In my tests that one is still faster than 1.9.

I can confirm this. That said, this seems to be a regression purely for the case of raising a number to the `2.0` power. If I change the power to `2.1` I get 9.6 seconds on 1.7, vs 3 seconds on 1.9. Also for a power of `3.0` I get a time of 390ms for 1.7 vs 285 for 1.9

I see a regression from 1.9 to 1.10 for `2.0`, `3.0` (197ms to 320ms) and `4.0` (817ms to 1.6s) and `4.1` (3.5s to 4.5s) for your example.

yeah. the 1.9 to 1.10 issue is real (and very weird)

My profile of 1.10-rc1 unlike in 1.9.3 seems to suggest that the non-fma path for `two_mul` is being called.

1.10-rc1

1.9.3

Weirdly LLVM IR says weβre using fma path

``````julia> @code_llvm Base.Math.two_mul(1., 2.)
;  @ math.jl:54 within `two_mul`
; Function Attrs: uwtable
define void @julia_two_mul_451([2 x double]* noalias nocapture noundef nonnull sret([2 x double]) align 8 dereferenceable(16) %0, double %1, double %2) #0 {
common.ret:
;  @ math.jl:56 within `two_mul`
; β @ float.jl:411 within `*`
%3 = fmul double %1, %2
; β
;  @ math.jl:57 within `two_mul`
; β @ float.jl:407 within `-`
%4 = fneg double %3
; β
; β @ floatfuncs.jl:439 within `fma`
; ββ @ floatfuncs.jl:434 within `fma_llvm`
%5 = call double @llvm.fma.f64(double %1, double %2, double %4)
; ββ
%newstruct.sroa.0.0..sroa_idx = getelementptr inbounds [2 x double], [2 x double]* %0, i64 0, i64 0
store double %3, double* %newstruct.sroa.0.0..sroa_idx, align 8
%newstruct.sroa.2.0..sroa_idx8 = getelementptr inbounds [2 x double], [2 x
double]* %0, i64 0, i64 1
store double %5, double* %newstruct.sroa.2.0..sroa_idx8, align 8
;  @ math.jl within `two_mul`
ret void
}
``````
1 Like