Hi, I noticed that there is a significant loss of performance between Julia 1.7.3 and 1.9.x for exponentiation of floats, which becomes ~ 5 times slower. I tested this independently on a few machines. A minimal example of this is attached below.
using BenchmarkTools
function test()
x = 1.
for i in 1:100_000_000
x += x^2.
end
return x
end
@btime test()
The problem remains in the available version of Julia 1.10.x, which is in fact 2 times slower than 1.9.x.
Yes, I am aware that if one uses an integer it is much faster. However, I do require exponentiating by floats in general and this is a very significant difference between older and newer version of Julia.
I see a 2x regression from 1.9.3 to 1.10-rc1. The @noinline x^yint doesnβt inline/ const prop as well as it does on 1.9.3 which could be the problem.
EDIT: Manually bypassed the call in 1.10 to ^ and inserted a pow_body call to match 1.9.3 but didnβt make a difference. Likely the regression is in pow_body as otherwise the LLVM IR seems to be identical.
Do you happen to be running this on a fairly old CPU? For cpus with FMA (haswell/bulldozer or newer) floating point powers should have gotten a lot faster.
Note that this is a pretty bad benchmark since 1^x is a special case so you are bench-marking the speed of a specific case rather than floating point exponentiation in general.
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 Γ AMD Ryzen 7 5700U with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
Threads: 1 on 16 virtual cores
Iβm fairly confused though as far as I can tell thereβs been no change to the code between 1.9.3 and 1.10-rc1 and the LLVM IR is near identical. Especially as ^2 is the same speed on both versions.
Oh, right. Itβs still a bad benchmark, but for a different reason After the first 12 iterations, itβs computing Inf^2 which is also a special case.
Youβre right that something is wrong with 1.10 (not sure what yet).
A better version of this benchmark would be
function test()
x = 1.1
for i in 1:100_000_000
x = x^2.0
x > 1e160 && (x = 1.1)
end
return x
end
Which gives me ~740 ms on 1.9 and nightly, but 1.3s on 1.10. This is very odd since we didnβt change anything here between 1.10 and nightly as far as I know.
I can confirm this. That said, this seems to be a regression purely for the case of raising a number to the 2.0 power. If I change the power to 2.1 I get 9.6 seconds on 1.7, vs 3 seconds on 1.9. Also for a power of 3.0 I get a time of 390ms for 1.7 vs 285 for 1.9