I noticed that Julia seems not to optimize the division of floating points and integer by two. I know that in C++ there are a couple of fast function that are used in these specific cases
const SHIFT16=UInt16(1)<<EXP16
const SHIFT32=UInt32(1)<<EXP32
const SHIFT64=UInt64(1)<<EXP64
# Division & Floor if the number is odd
function fast_division_by_2(x::Integer)::Integer
x >> ONE;
end
# Division
function fast_division_by_2(x::Float16)::Float16
return (x!=0)*reinterpret(Float16, reinterpret(Int16, x)- SHIFT16)
end
function fast_division_by_2(x::Float32)::Float32
return (x!=0)*reinterpret(Float32, reinterpret(Int32, x)- SHIFT32)
end
function fast_division_by_2(x::Float64)::Float64
return (x!=0)*reinterpret(Float64, reinterpret(Int64, x)- SHIFT64)
end
The difference is significant when used with @fastmath. I compared the results using 1000 evaluations over a sample size of 10^5 elements
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(132.278 μs)
"fast" => Trial(10.285 μs)
julia> results["divi16"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(313.996 μs)
"fast" => Trial(31.273 μs)
julia> results["divi32"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(128.252 μs)
"fast" => Trial(43.865 μs)
julia> results["divi64"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(210.600 μs)
"fast" => Trial(165.764 μs)
julia> results["divf16"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(21.816 μs)
"fast" => Trial(19.050 μs)
julia> results["divf32"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(50.990 μs)
"fast" => Trial(41.615 μs)
julia> results["divf64"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"normal" => Trial(104.193 μs)
"fast" => Trial(88.767 μs)
There is some performance to gain using
xi=reinterpret(Float,Int)
~iszero(xi)*(xi-SHIFT)
However, since 0.0 and -0,0 are different numbers, the shift would set -0 to Inf.
Forcing to check for that elsewhere