I have a particular example, written minimally as:
using ModelingToolkit
@variables t # independent variable (time)
dDt = Differential(t) # define an operator for the differentiation w.r.t. time
const normalized_stefan_boltzmann_4root = 0.0035929684253716994
@variables T(t) S(t) α(t) ε(t)
absorbed_shortwave = S*(1 - α)
emitted_longwave = ε*(normalized_stefan_boltzmann_4root*T)^4
eqs = [
dDt(T) ~ absorbed_shortwave - emitted_longwave,
S ~ 340.25,
α ~ 0.3,
ε ~ 0.65
]
@named ebm = ODESystem(eqs)
ebm_simplified = structural_simplify(ebm)
which will yield
julia> equations(ebm_simplified)
1-element Vector{Equation}:
Differential(t)(T(t)) ~ (1 - α(t))*S(t) - 1.6665317910360028e-10(T(t)^4)*ε(t)
As you will notice, there has been a simplification where the constant (normalized_stefan_boltz...
) has been taken out of the raising to the 4th power and was raised to it. However, the reason I wrote the equations this way is because I believe they yield better numeric stability, because the product of the normalized constant with T
will be much closer to 1, which when raised to the 4th power will retain much more digits after the decimal.
(by the way, am I wrong in believing that this will indeed yield better numeric stability?)
If I am right, is there a way to tell ModelingToolkit.jl to not do this simplification?
You didn’t show the full equations so it’s hard to tell, but my guess is that the reason for this is because the way you did the graph construct led to the eager application on the numeric constants, so it’s outside the simplification passes of Symbolics. I’d have to look at the code to see.
But also, it doesn’t effect numeric stability here since it only would if you were differencing that quantity against another. Successive multiplications don’t generally have this same issue, it’s the -
that then is the root cause in cases like polynomials for which this gets tricky.
The reduction described by OP is commonly used for simulating in reduced precision where you have to worry about floating point overflow.
Oh, I am sorry, I missed to copy two lines, the equations are
absorbed_shortwave = S*(1 - α)
emitted_longwave = ε*(normalized_stefan_boltzmann_4root*T)^4
I’ve edited the post.
The simplification happens before the structural_simplify
call:
julia> equations(ebm)
4-element Vector{Equation}:
Differential(t)(T(t)) ~ (1 - α(t))*S(t) - 1.6665317910360028e-10(T(t)^4)*ε(t)
S(t) ~ 340.25
α(t) ~ 0.3
ε(t) ~ 0.65
Oh, you are right:
julia> T = 300.0
300.0
julia> (normalized_stefan_boltzmann_4root*T)^4
1.3498907507391622
julia> 1.6665317910360028e-10*(T^4)
1.3498907507391622
I guess for 64 bit and with my particular constant it doesn’t matter but I think for less bit or for much larger constants it would…? I was speculating so far but that’s what @Oscar_Smith seems to say as well.
The rule for floating point multiplications is that your accuracy is perfect* unless the value is outside the range of your floating point type. For Float64
, this range is ~10^308. You will never hit this when doing anything even somewhat reasonable. Float32
goes up to ~3*10^38 which is still big enough most of the time, but can occasionally cause issues. Float16
only goes up to 65500. This causes tons of problems.
* up to floating point epsilon which has constant** relative error
** except for subnormal numbers
2 Likes
Exactly. Floating point has relative accuracy. If you scale all of your values by 10^10 then you’re fine. The issue is 10^10 - 1.23123141131. The two numbers are of different relative scales, so the accuracy of the result is chopped (here you only get 6 digits). But that also means you can keep scaling and as @Oscar_Smith just mentioned, if you stay in the dynamic range of floating point (so ~10^300) then you’re fine. 10^200 + 1.2314123123110^200 is fine. All numbers in that range are fine if they are compared only with other numbers in that range. 1 + 1.2314123123110^200 is not.
(Yeah there are small exceptions, subnormal numbers etc., but general ideas)