Speed of power "^"

zhubonan · May 6, 2022, 12:08pm

I am writing a code that involves lots of use of “^” and profiling seems to suggest “^” is contributing a lot of the time.
I have done the following test of the timing. Replacing “^” “*” seems to give a massive speed up, which is not surprising. Interestingly, x^3 is much faster than x^4 and it seems that the compiler is capable of dynamically convert “^” to “*”!

f1(x) = x ^ 2
f2(x) = x ^ 1.1
f3(x) = x ^ 3
f4(x) = x ^ 4
f5(x) = x ^ 9.9
f6(x) = x ^ 5
f7(x, y) = x ^ y
f8(x) = x * x * x * x * x
@btime f1(1.5)
@btime f2(1.5)
@btime f3(1.5)
@btime f4(1.5)
@btime f5(1.5)
@btime f6(1.5)

@btime f7(1.5, 3)
@btime f7(1.5, 9.9)
@btime f8(1.5)

  0.011 ns (0 allocations: 0 bytes)
  0.864 ns (0 allocations: 0 bytes)
  0.011 ns (0 allocations: 0 bytes)
  0.825 ns (0 allocations: 0 bytes)
  0.825 ns (0 allocations: 0 bytes)
  0.826 ns (0 allocations: 0 bytes)
  0.011 ns (0 allocations: 0 bytes)
  0.861 ns (0 allocations: 0 bytes)
  0.011 ns (0 allocations: 0 bytes)

Unfortunately, I have to go beyond “x^3”… I am wondering if there is any tip to improve the performance further. Perhaps the solution would be to convert “^” to “*” in the equation at compile time through macros?

giordano · May 6, 2022, 12:13pm

Note that you aren’t benchmarking anything. Subnanosecond times indicate you’re running into constant propagation: Manual · BenchmarkTools.jl. In Julia v1.8 BenchmarkTools will be able to automatically stop constant propagation.

zhubonan · May 6, 2022, 12:16pm

Ahh I see @giordano thanks! I will check with v1.8.

Also I found that it is actually in the stdlib where the change to * is defined:

https://github.com/JuliaLang/julia/blob/bf534986350a991e4a1b29126de0342ffd76205e/base/math.jl#L922-L929

Perhaps there is a way to extended it beyond x*x*x*x in this case.

Sukera · May 6, 2022, 12:42pm

What timings do you see in your real application? How did you come to the conclusion that ^ is the bottleneck (did you profile using e.g. the Profile stdlib)?

If you’re doing lots of ^, I’d also expect ^ to take up most of the time, to be honest.

DNF · May 6, 2022, 12:47pm

You can check out Base.power_by_squaring. Be aware that numerical accuracy is not quite as good, but that may not be a problem for you.

Oscar_Smith · May 6, 2022, 12:48pm

Julia 1.8 is much faster here. that said, powers are inherently a lot harder than multiplication.

lmiq · May 6, 2022, 1:00pm

Also take a look at https://github.com/JuliaMath/FastPow.jl

stevengj · May 6, 2022, 1:19pm

To clarify: for literal integer powers like x^7, this will give you better performance at the cost of slightly degraded accuracy. (For the built-in hardware floating-point types you can also simply use @fastmath.)

gdalle · May 6, 2022, 1:33pm

I’m not sure what you are trying to do, but if you happen to evaluate polynomials, consider using evalpoly (aka Horner’s method) to mitigate the number of multiplications / powers

zhubonan · May 6, 2022, 1:44pm

Yeah I used Pofile stdlib to do the profiling. There are lots of “^” inside loops, taking sum of different powers…

zhubonan · May 6, 2022, 2:15pm

Thanks for all the suggestions!
In the end I just made a more aggressive version of (^) in Base:

@inline function fast_pow(x, y)
    y == -1 && return inv(x)
    y == 0 && return one(x)
    y == 1 && return x
    y == 2 && return x * x
    y == 3 && return x * x * x
    y == 4 && return x * x * x * x
    y == 5 && return x * x * x * x * x
    y == 6 && return x * x * x * x * x * x
    y == 7 && return x * x * x * x * x * x * x
    y == 8 && return x * x * x * x * x * x * x * x 
    y == 9 && return x * x * x * x * x * x * x * x * x 
    y == 10 && return x * x * x * x * x * x * x * x * x * x 
    y == 11 && return x * x * x * x * x * x * x * x * x * x * x 
    y == 12 && return x * x * x * x * x * x * x * x * x * x * x * x 
    ^(x, y)
end

and it already gives good speedups.
Because y is only known at runtime so I cannot use FastPow.jl to statically expand it.

Oscar_Smith · May 6, 2022, 2:17pm

Note that this will be much less accurate than ^.

gdalle · May 6, 2022, 2:29pm

Out of curiosity, where do you use these powers? In a polynomial?

zhubonan · May 6, 2022, 2:44pm

ahh good point.

zhubonan · May 6, 2022, 2:50pm

Not quite polynomial, something like:

function f(x, y, z, p, q)
    np = length(p)
    nq = length(q
    results = zeros(np * nq)
    k = 1
    for i in 1:np
        for j in 1:nq
            results[k] = x^p[i] * y^p[i] *  z^q[j]
            k += 1
        end
    end
end

Oscar_Smith · May 6, 2022, 2:52pm

Oh, this can be made 3x faster trivially by computing x^p[i]*y^p[i] outside the inner loop.

If you are willing to lose a little bit of accuracy, you can get another 2x speedup by saving log(z) and writing

function f(x, y, z, p, q)
    np = length(p)
    nq = length(q
    results = zeros(np * nq)
    k = 1
    logz = log(z)
    for i in 1:np
        xyp = x^p[i] * y^p[i] 
        for j in 1:nq
            results[k] = xyp * exp(logz*q[j])
            k += 1
        end
    end
end

stevengj · May 6, 2022, 3:17pm

This (and your other cases) are using more multiplications than necessary. For example, x^8 = ((x*x)^2)^2, which is only 3 multiplications instead of 7. You can do return @fastpow x^8 here (or return @fastmath x^8 for hardware-float types), I guess.

But I agree with @Oscar_Smith that it’s better to optimize the larger context for these calculations.

Topic		Replies	Views
Seemingly large permformance regression on "^" power operator in >=1.6 General Usage bug , performance	2	406	April 28, 2021
Small, fixed powers Performance	5	503	May 28, 2021
Arithmetic performance of expression Performance	11	363	October 4, 2022
Significant difference in execution speed between similar functions with nested loops Performance question	4	237	March 8, 2023
Power function with Integer ( x^4 much slower than xxx*x ) Performance	12	1446	May 18, 2018

Speed of power "^"

Related topics