wtf(x) = x * Float64(big(12.34) / big(180)^3)
@code_llvm wtf(1.0)
shows runtime calls to do the big-ifying, and the exponentiation.
If I remove the calls to big
, it constant-folds just fine.
wtf(x) = x * Float64(big(12.34) / big(180)^3)
@code_llvm wtf(1.0)
shows runtime calls to do the big-ifying, and the exponentiation.
If I remove the calls to big
, it constant-folds just fine.
The problem is that BigInt
s are technically mutable so the compiler can’t prove the operations side effect free.
BigFloat
precision can also be modified at runtime:
julia> f(x) = Float64(x * big(12.34))
f (generic function with 1 method)
julia> setprecision(BigFloat, 4)
4
julia> f(1)
12.0
julia> setprecision(BigFloat, 1000)
1000
julia> f(1)
12.34
Ugh. So the workaround is to force evaluation myself.
# Force constant folding, since BigInt and BigFloat values don't act as constants
macro cf(expr)
setrounding(RoundNearest) do
setprecision(256) do
eval(expr)
end
end
end
wtf(x) = x * @cf(Float64(big(12.34) / big(180)^3))
That’s not the only problem - BigFloat and BigInt (both using GMP under the hood) keep LOTS of internal state (like caching numbers), as well as crossing lots and lots of ccall
boundaries on every operation. The compiler is intentionally not constant folding those, because there is no way the compiler can know what the effects of a ccall
are - it’s an inaccessible binary blob after all.
I’d recommend against a solution like that, but maybe you have a usecase where that is required? It’s possible there’s a better solution than a macro using eval
. What are you trying to achieve in the end?
My use case is that I have a polynomial to evaluate. But I want to scale the coefficients by an amount that is computed. I have determined experimentally that computing the scaled coefficient in Float64 sometimes gets a result that is off by an ulp. So I need to do the computation in big precision and then round to Float64 at the end. I want to leave the scaling computation in the source, so that it is understandable. Rather than just putting the result of the computation in the source.
You may want to try one of the quad-precision floating point implementations instead of resorting to BigFloat
s if you only need a bit more precision - they should be more amenable to constant-folding. See: GitHub - JuliaMath/DoubleFloats.jl: math with more good bits
Specifically, DoubleFloats
is faster when the exponent range of Float64
is sufficient, while Float128
will be a little slower, but give you a bigger exponent range in case you need more than 10^-324 absolute precision (Float128 goes down to 10^-5000 or so).
Here’s another approach—to evaluate the expression outside the function and then capture it.
julia> let a = Float64(big(12.34) / big(180)^3)
global fml(x) = x * a
end
fml (generic function with 1 method)
julia> @code_llvm fml(1.0)
Nice! Thanks! DoubleFloats should do for my purposes. I need more bits in the mantissa for the intermediate results. I don’t need exponent range. I don’t care about speed, since I’m counting on the compiler to constant-fold it.
You might want to check which operations will constant-fold.
In my testing, Float64(Double64(12.34) / Double64(180)^3)
did not constant-fold, but Float64(Double64(12.34) / 180^3)
did. Meanwhile, anything with Float128
doesn’t seem to constant-fold.
Interestingly, Double64(180)^3
is what’s preventing the constant-folding, but if the exponent is made to be a non-integer value, then it constant-folds.
julia> versioninfo()
Julia Version 1.9.0-beta3
Commit 24204a7344 (2023-01-18 07:20 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 × 12th Gen Intel(R) Core(TM) i9-12900HK
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
Threads: 1 on 20 virtual cores
julia> using DoubleFloats, Quadmath, BenchmarkTools
julia> wtf(x) = x * Float64(big(12.34) / big(180)^3)
wtf2(x) = x * Float64(Double64(12.34) / Double64(180)^3)
wtf3(x) = x * Float64(Double64(12.34) / 180^3)
wtf4(x) = x * Float64(Float128(12.34) / Float128(180)^3)
wtf5(x) = x * Float64(Float128(12.34) / 180^3)
let a = Float64(big(12.34) / big(180)^3)
global fml(x) = x * a
end
fml (generic function with 1 method)
julia> @btime wtf(1.0)
@btime wtf2(1.0)
@btime wtf3(1.0) # constprop
@btime wtf4(1.0)
@btime wtf5(1.0)
@btime fml(1.0) # constprop
wtf(1.0) ≡ wtf2(1.0) ≡ wtf3(1.0) ≡ wtf4(1.0) ≡ wtf5(1.0) ≡ fml(1.0)
461.538 ns (13 allocations: 400 bytes)
47.571 ns (0 allocations: 0 bytes)
1.000 ns (0 allocations: 0 bytes)
92.453 ns (0 allocations: 0 bytes)
48.887 ns (0 allocations: 0 bytes)
1.000 ns (0 allocations: 0 bytes)
true
julia> # notice when exponent is integer-valued (even when float), const-prop fails
wtf2(x) = x * Float64(Double64(12.34) / Double64(180)^3)
wtf2_2(x) = x * Float64(Double64(12.34) / Double64(180)^3.0)
wtf2_3(x) = x * Float64(Double64(12.34) / Double64(180)^3(1-Double64(2)^-100.0))
wtf2_4(x) = x * Float64(Double64(12.34) / Double64(180)^3(1-Double64(2)^-100.001))
@btime wtf2(1.0)
@btime wtf2_2(1.0)
@btime wtf2_3(1.0) # no constprop
@btime wtf2_4(1.0) # constprop!
wtf2(1.0) ≡ wtf2_2(1.0) ≡ wtf2_3(1.0) ≡ wtf2_4(1.0)
47.536 ns (0 allocations: 0 bytes)
461.421 ns (8 allocations: 352 bytes)
1.160 μs (8 allocations: 352 bytes)
1.000 ns (0 allocations: 0 bytes)
true
Double64^Int
doesn’t constant fold because it can’t prove the loop terminates.
Base.infer_effects(^, (Double64, Int))
(+c,+e,+n,!t,+s,+m,+i)
Our termination analysis right now is really weak and in general is unable to prove that even relatively trivial loops terminate.
In my testing, even exponentiating by an integer-valued float prevented constant-folding:
Float64(Double64(12.34) / Double64(180)^3) # no constant-folding
Float64(Double64(12.34) / Double64(180)^3.0) # no constant-folding, worse runtime
Float64(Double64(12.34) / Double64(180)^3(1-Double64(2)^-100.001)) # constant-folds
function Base.:(^)(r::DoubleFloat{T}, n::DoubleFloat{T}) where {T<:IEEEFloat}
if isinteger(n)
return r^Int(n)
else
return exp(n * log(r))
end
end
Although given the effects of Double64
log
and exp
, I’m surprised that those are folding.