# Fast "bool ^ float", and fast "float ^ bool" as a gift!

I already wrote this before I thought to ask Oscar Smith what he meant by “PR”. I am not an user of GitHab and do not know how to make pull requests, so I leave it here.

THIS TEXT AND CODE IS PROVIDED “AS IS”, WITHOUT WARRANTY, ECT.

Oscar Smith asked me to “make a PR to base for the Float64^Float64 algorithm” if I can find a faster way of “bool ^ float” than here: Speed up `x::Bool ^ y::Float64`.

``````function f3(x::Bool, y::T) where T<:AbstractFloat
ifelse(x | iszero(y),  one(T), abs(y) * T(Inf) * (!(y>0)))
end
``````

I found a faster way witch done on my processor without hard using low-level specific like intrinsics and so on.

I checked my achievements on Ryzen 9 and Core i3.

On older processors (for example on my core i3) Oscar Smith’s way is faster.

Since I hope that the hardware and Julia will progress, I consider the result achieved and fulfill the request of Oscar Smith by making a PR as best I can.

Let’s skip the reasons of “bool ^ float” witch are different for everyone,
and if we are here and I understood everything correctly with my english and google-translate… Ready? Go!

``````                             Fast "bool ^ float" PR
``````

Are you going to rased stupid `bool` to the power of insidious `float`? But can’t sit still while a logical one is raised to the power of negative zero? I really understand you…

Take this!

``````pow_fast(x::Bool, y::T) where T <: AbstractFloat = ifelse(x | iszero(y), T(1) , ifelse(isnan(y), T(NaN), ifelse(signbit(y), T(Inf), T(0))));
pow_fast(x::Bool, y::BigFloat) = big(ifelse(x | iszero(y), 1.0 , ifelse(isnan(y), NaN, ifelse(signbit(y), Inf, 0.0))));
``````

100 (about) times faster than native! (on Ryzen 9)

“Ctrl + A + C” and it’s yours!

This hot cake mined from the permafrost and delivered across the snowy suning plain specially for you with love!

Just two line save you time!

Fast “float ^ bool” as a gift!

``````pow_fast(x::T, y::Bool) where T <: AbstractFloat = y ? copy(x) : T(1)
pow_fast(x::BigFloat, y::Bool) = y ? x : big(1.0)
``````
``````                               Code, benchmarks, tests:
``````

fast bool ^ float:

``````using BenchmarkTools, Test

#
pow_native(x::X, y::Y) where {X,Y} = x ^ y

# by Oscar Smith, faster on intel core i3 in some tests
# https://discourse.julialang.org/t/speed-up-x-bool-y-float64/90601
function pow_fast_1(x::Bool, y::T) where T <: AbstractFloat
ifelse(x | iszero(y),  one(T), abs(y) * T(Inf) * (!(y>0)))
end;

# my last way (pow_fast), faster on Ryzen 9
pow_fast_2(x::Bool, y::T) where T <: AbstractFloat = ifelse(x | iszero(y), T(1) , ifelse(isnan(y), T(NaN), ifelse(signbit(y), T(Inf), T(0))));
pow_fast_2(x::Bool, y::BigFloat) = big(ifelse(x | iszero(y), 1.0 , ifelse(isnan(y), NaN, ifelse(signbit(y), Inf, 0.0))));

# test data
function get_test_data(::Type{T}) where T <: AbstractFloat
n = 10000
m = n ÷ 100  # for special values of the same type NaN, -Inf ...
r = zeros(T, n)  # for result
x = rand(Bool, n)
y = (T <: BigFloat ? big.(randn(Float64, n)) : randn(T, n)) .^ 111
y[rand(1:n, m)] .= T(NaN)
y[rand(1:n, m)] .= -T(Inf)
y[rand(1:n, m)] .= T(Inf)
y[rand(1:n, m)] .= nextfloat(-T(Inf))
y[rand(1:n, m)] .= prevfloat(T(Inf))
y[rand(1:n, m)] .= T(0)
y[rand(1:n, m)] .= -T(0)
return n, r, x, y
end

#tests
@testset verbose = true "fast `bool ^ float`" begin
for Flt in subtypes(AbstractFloat)
@testset verbose = true " \$Flt" begin
n, r, x, y = get_test_data(Flt)
for pow_fast in (pow_fast_1, pow_fast_2)
@testset "\$pow_fast" begin
for i = 1 : n
r_native = pow_native(x[i],y[i])
r_fast = pow_fast(x[i],y[i])

# big(1.0) !== big(1.0), NaN != NaN
@test r_native == r_fast ? true :
isnan(r_native) & isnan(r_fast) ? true : false
end
end
end
end
end
end;

# benchmarks
f!(f,r,x,y,n) = for i = 1 : n
r[i] = f(x[i], y[i])
end;

for Flt in subtypes(AbstractFloat)
n, r, x, y = get_test_data(Flt)
println("\$Flt")
for pow in (pow_native, pow_fast_1, pow_fast_2)
println(" \$pow")
@btime f!(\$pow,\$r,\$x,\$y,\$n)
end
end
``````

fast float ^ bool:

``````using BenchmarkTools, Test

pow_native(x::X, y::Y) where {X,Y} = x ^ y

# NOTE pow_fast(x::T, y::Bool) do not translate -0.0 -> 0.0
# pow_native(-0.0, true) ->  0.0
# pow_fast(-0.0, true)   -> -0.0
pow_fast(x::T, y::Bool) where T <: AbstractFloat = y ? copy(x) : T(1)
pow_fast(x::BigFloat, y::Bool) = y ? x : big(1.0)

# test data
function get_test_data(::Type{T}) where T <: AbstractFloat
n = 100000
m = n ÷ 100  # for special values of the same type NaN, -Inf ...
r = zeros(T, n)  # for result
x = (T <: BigFloat ? big.(randn(Float64, n)) : randn(T, n)) .^ 111
y = rand(Bool, n)
x[rand(1:n, m)] .= T(NaN)
x[rand(1:n, m)] .= -T(Inf)
x[rand(1:n, m)] .= T(Inf)
x[rand(1:n, m)] .= nextfloat(-T(Inf))
x[rand(1:n, m)] .= prevfloat(T(Inf))
x[rand(1:n, m)] .= T(0)
x[rand(1:n, m)] .= -T(0)
return n, r, x, y
end

@testset verbose = true "fast `float ^ bool`" begin
for Flt in subtypes(AbstractFloat)
@testset verbose = true " \$Flt" begin
n, r, x, y = get_test_data(Flt)
for pow_fast in (pow_fast,)
@testset "\$pow_fast" begin
for i = 1 : n
r_native = pow_native(x[i],y[i])
r_fast = pow_fast(x[i],y[i])

# big(1.0) !== big(1.0), NaN != NaN
@test r_native == r_fast ? true :
isnan(r_native) & isnan(r_fast) ? true : false
end
end
end
end
end
end;

# benchmarks
f!(f,r,x,y,n) = for i = 1 : n
r[i] = f(x[i], y[i])
end;

for Flt in subtypes(AbstractFloat)
n, r, x, y = get_test_data(Flt)
println("\$Flt")
for pow in (pow_native, pow_fast)
println(" \$pow")
@btime f!(\$pow,\$r,\$x,\$y,\$n)
end
end

``````
3 Likes

CPU is Intel i5:

For `bool ^ float`:
`pow_fast_2` faster (1.5x) than `pow_fast_1` on `BigFloat` and `Float16`.
`pow_fast_2` same (0.95x) as `pow_fast_1` on `Float32` and `Float64`.

For `float ^ bool`:
`pow_fast` same (1.1x) as `pow_native` on `BigFloat`
`pow_fast` is faster (2x) than `pow_native` on `Float16`
`pow_fast` is seriously faster (>20x) than `pow_native` on `Float32` and `Float64`.

2 Likes

CPU is Ryzen 9:

``````bool^float benchmarks:

BigFloat
pow_native:   1.123 ms (40000 allocations: 1.98 MiB)
pow_fast_1:   4.410 ms (179814 allocations: 7.09 MiB)
pow_fast_2: 812.800 μs (20000 allocations: 1015.62 KiB)
Float16
pow_native: 194.700 μs (0 allocations: 0 bytes)
pow_fast_1:  14.500 μs (0 allocations: 0 bytes)
pow_fast_2:  14.400 μs (0 allocations: 0 bytes)
Float32
pow_native: 252.300 μs (0 allocations: 0 bytes)
pow_fast_1:   3.388 μs (0 allocations: 0 bytes)
pow_fast_2:   1.490 μs (0 allocations: 0 bytes)
Float64
pow_native: 261.800 μs (0 allocations: 0 bytes)
pow_fast_1:   3.513 μs (0 allocations: 0 bytes)
pow_fast_2:   3.312 μs (0 allocations: 0 bytes)

float^bool benchmarks:

BigFloat
pow_native:   6.714 ms (200000 allocations: 9.92 MiB)
pow_fast:     5.232 ms (99770 allocations: 4.95 MiB)
Float16
pow_native: 507.300 μs (0 allocations: 0 bytes)
pow_fast:   320.100 μs (0 allocations: 0 bytes)
Float32
pow_native: 495.600 μs (0 allocations: 0 bytes)
pow_fast:     7.875 μs (0 allocations: 0 bytes)
Float64
pow_native: 500.700 μs (0 allocations: 0 bytes)
pow_fast:    16.000 μs (0 allocations: 0 bytes)
``````

Making a PR (= Pull request) may sound intimidating. But actually, it is not hard and a very useful skill. If you already know how to use git, it is no harder than posting on this forum.
The “hardest” part will be building julia, which at least on a linux computer is also straightforward.

more long & more faster:

``````pow_fast(x::Bool, y::T) where T <: AbstractFloat = x ? one(T) : ifelse(iszero(y), one(T), ifelse(isnan(y), T(NaN), ifelse(signbit(y), T(Inf), zero(T))));
pow_fast(x::Bool, y::BigFloat) = x ? big(1.0) : big(ifelse(iszero(y), 1.0 , ifelse(isnan(y), NaN, ifelse(signbit(y), Inf, 0.0))));

pow_fast(x::T, y::Bool) where T <: AbstractFloat = y ? x : one(T);
pow_fast(x::BigFloat, y::Bool) = y ? x : big(1.0);  # only big(1.0), not one(x), not BigFloat(1), not one(T)
``````
1 Like