Taking power with float exponent x^y is slower than exp(y*log(x))?

Updated results: There was some sloppiness on my benchmark approach. The discrepancy isn’t that large.

julia> @benchmark x^1.5 setup=(x=rand()) samples=100000
BenchmarkTools.Trial: 100000 samples with 998 evaluations per sample.
 Range (min … max):  14.586 ns … 49.423 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     14.788 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.805 ns ±  0.229 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▅██▅             ▁▂▁                              ▂ ▃
  ▄▃▃▃▃▁▃▇████▇▅▅▄▄▇█▇▇▅▃▄▆███▇▄▅▄▃▁▃▁▁▃▄▃▁▁▁▃▁▃▃▁▁▁▁▁▁▁▃▅▆██ █
  14.6 ns      Histogram: log(frequency) by time      15.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark exp(1.5*log(x)) setup=(x=rand()) samples=100000
BenchmarkTools.Trial: 100000 samples with 1000 evaluations per sample.
 Range (min … max):   4.629 ns … 65.165 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.923 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.649 ns ±  1.867 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅                                                █▃▁▂   ▂▂▅ ▃
  █▆▆▃▃▁▅▆▃▃▄▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁█████▄▄▅███ █
  4.63 ns      Histogram: log(frequency) by time      13.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I don’t understand why the latter has a cluster of very fast results though. But it’s probably a different issue.


Original post:

I know that x^y with y being Float64 is slower than y being integers. But, I just realized that x^y is also noticeably slower than exp(y*log(x)). Is this expected? I imagine that x^y is slightly more accurate?

Here are my results on both a Macbook and an X86 desktop.

julia> versioninfo()
Julia Version 1.12.0-rc2
Commit 72cbf019d04 (2025-09-06 12:00 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m1)
  GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

julia> using BenchmarkTools

julia> @btime x^1.5 setup=(x=rand()) # Slow float exponent
  14.320 ns (0 allocations: 0 bytes)
0.1302126529886922

julia> @btime exp(1.5*log(x)) setup=(x=rand()) # Faster but mathematically equivalent
  8.466 ns (0 allocations: 0 bytes)
0.05577688891274848
julia> versioninfo()
Julia Version 1.12.0-rc2
Commit 72cbf019d04 (2025-09-06 12:00 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × AMD Ryzen 9 9950X 16-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver5)
  GC: Built with stock GC
Threads: 16 default, 1 interactive, 16 GC (on 32 virtual cores)
Environment:
  JULIA_NUM_THREADS = 16

julia> using BenchmarkTools

julia> @btime x^1.5 setup=(x=rand()) # Slow float exponent
  14.576 ns (0 allocations: 0 bytes)
0.7008843249519519

julia> @btime exp(1.5*log(x)) setup=(x=rand()) # Faster but mathematically equivalent
  4.849 ns (0 allocations: 0 bytes)
0.962818987187762

Update: The x should be randomized now. The gap is even larger between the two. These comparisons are motivated by something observed in my actual usage where lots of ^ are involved.

3 Likes

I highly suspect that you’re not benchmarking what you think you’re benchmarking. By interpolating the global variable, you’re effectively benchmarking the literal expression 0.09969088930420678 ^ 1.5 which is surely not representative of a generic benchmark (I’d even be surprised if this was not constant folded in the function generated by the @btime macro internally). Use the setup keyword for initializing x instead.

Thank you! The revised version should look right now. The gap is even larger now (but consistent with what I observed in actual usage environment).

1 Like

Keep in mind that @btime shows you the fastest of all executions, which may also not be representative. What does the graph shown by the full @benchmark look like? The random numbers given by rand() are also always in the half-open interval [0, 1), so this may also be a factor (though won’t influence the optimizer anymore).

That’s a good catch. I should have tried @benchmark for such things. The median does seem much closer.

1 Like

x^y requires some heroics to be accurate. specifically you need about 16 bits of extra accuracy out of your log

4 Likes

As stated above, we take pains to ensure greater accuracy in x^y than exp(y*log(x)) would give (just as IEEE-754 says we should). But if you don’t need full precision, then the exp+log approach is fine if it benchmarks noticeably faster.

For the specific case of x^1.5, I measure x*sqrt(x) as 8x faster than either and it should only a bit or two less accurate than ^.

3 Likes

x*sqrt(x) should be as accurate as ^

1 Like

This doesn’t seem to be quite true. Here is the error difference in ulps for random x \in [0,1):

julia> err_diff = [let x = rand()
                        e = big(x)^(3//2) # "exact" result
                        Float64(abs(x * sqrt(x) - e) - abs(x^1.5 - e)) / eps(Float64(e))
                    end for _ in 1:10^6]

The result shows that the error of x * sqrt(x) is worse much more often than the error is better, up to 1ulp worse, though on average it is only 0.08ulps worse.

julia> count(err_diff .< 0) # x * sqrt(x) is better
5169

julia> count(err_diff .> 0) # x * sqrt(x) is worse
238882

julia> mean(err_diff)
0.082866970466614

julia> extrema(err_diff)
(-0.0425258183554384, 1.0)

(and it is never more than slightly better than x^1.5).

4 Likes