# Speed of power "^"

I am writing a code that involves lots of use of “^” and profiling seems to suggest “^” is contributing a lot of the time.
I have done the following test of the timing. Replacing “^” “*” seems to give a massive speed up, which is not surprising. Interestingly, x^3 is much faster than x^4 and it seems that the compiler is capable of dynamically convert “^” to “*”!

``````f1(x) = x ^ 2
f2(x) = x ^ 1.1
f3(x) = x ^ 3
f4(x) = x ^ 4
f5(x) = x ^ 9.9
f6(x) = x ^ 5
f7(x, y) = x ^ y
f8(x) = x * x * x * x * x
@btime f1(1.5)
@btime f2(1.5)
@btime f3(1.5)
@btime f4(1.5)
@btime f5(1.5)
@btime f6(1.5)

@btime f7(1.5, 3)
@btime f7(1.5, 9.9)
@btime f8(1.5)
``````
``````  0.011 ns (0 allocations: 0 bytes)
0.864 ns (0 allocations: 0 bytes)
0.011 ns (0 allocations: 0 bytes)
0.825 ns (0 allocations: 0 bytes)
0.825 ns (0 allocations: 0 bytes)
0.826 ns (0 allocations: 0 bytes)
0.011 ns (0 allocations: 0 bytes)
0.861 ns (0 allocations: 0 bytes)
0.011 ns (0 allocations: 0 bytes)

``````

Unfortunately, I have to go beyond “x^3”… I am wondering if there is any tip to improve the performance further. Perhaps the solution would be to convert “^” to “*” in the equation at compile time through macros?

Note that you aren’t benchmarking anything. Subnanosecond times indicate you’re running into constant propagation: Manual · BenchmarkTools.jl. In Julia v1.8 `BenchmarkTools` will be able to automatically stop constant propagation.

3 Likes

Ahh I see @giordano thanks! I will check with v1.8.

Also I found that it is actually in the stdlib where the change to * is defined:

Perhaps there is a way to extended it beyond `x*x*x*x` in this case.

1 Like

What timings do you see in your real application? How did you come to the conclusion that `^` is the bottleneck (did you profile using e.g. the Profile stdlib)?

If you’re doing lots of `^`, I’d also expect `^` to take up most of the time, to be honest.

1 Like

You can check out `Base.power_by_squaring`. Be aware that numerical accuracy is not quite as good, but that may not be a problem for you.

2 Likes

Julia 1.8 is much faster here. that said, powers are inherently a lot harder than multiplication.

1 Like
3 Likes

To clarify: for literal integer powers like `x^7`, this will give you better performance at the cost of slightly degraded accuracy. (For the built-in hardware floating-point types you can also simply use `@fastmath`.)

I’m not sure what you are trying to do, but if you happen to evaluate polynomials, consider using `evalpoly` (aka Horner’s method) to mitigate the number of multiplications / powers

1 Like

Yeah I used Pofile stdlib to do the profiling. There are lots of “^” inside loops, taking sum of different powers…

Thanks for all the suggestions!
In the end I just made a more aggressive version of `(^)` in `Base`:

``````@inline function fast_pow(x, y)
y == -1 && return inv(x)
y == 0 && return one(x)
y == 1 && return x
y == 2 && return x * x
y == 3 && return x * x * x
y == 4 && return x * x * x * x
y == 5 && return x * x * x * x * x
y == 6 && return x * x * x * x * x * x
y == 7 && return x * x * x * x * x * x * x
y == 8 && return x * x * x * x * x * x * x * x
y == 9 && return x * x * x * x * x * x * x * x * x
y == 10 && return x * x * x * x * x * x * x * x * x * x
y == 11 && return x * x * x * x * x * x * x * x * x * x * x
y == 12 && return x * x * x * x * x * x * x * x * x * x * x * x
^(x, y)
end
``````

and it already gives good speedups.
Because `y` is only known at runtime so I cannot use FastPow.jl to statically expand it.

Note that this will be much less accurate than `^`.

2 Likes

Out of curiosity, where do you use these powers? In a polynomial?

ahh good point.

Not quite polynomial, something like:

``````function f(x, y, z, p, q)
np = length(p)
nq = length(q
results = zeros(np * nq)
k = 1
for i in 1:np
for j in 1:nq
results[k] = x^p[i] * y^p[i] *  z^q[j]
k += 1
end
end
end
``````

Oh, this can be made 3x faster trivially by computing `x^p[i]*y^p[i]` outside the inner loop.

If you are willing to lose a little bit of accuracy, you can get another 2x speedup by saving `log(z)` and writing

``````function f(x, y, z, p, q)
np = length(p)
nq = length(q
results = zeros(np * nq)
k = 1
logz = log(z)
for i in 1:np
xyp = x^p[i] * y^p[i]
for j in 1:nq
results[k] = xyp * exp(logz*q[j])
k += 1
end
end
end``````
11 Likes

This (and your other cases) are using more multiplications than necessary. For example, `x^8 = ((x*x)^2)^2`, which is only 3 multiplications instead of 7. You can do `return @fastpow x^8` here (or `return @fastmath x^8` for hardware-float types), I guess.

But I agree with @Oscar_Smith that it’s better to optimize the larger context for these calculations.

7 Likes