I want to use Base.exp in my model which uses Float32 with StaticArray and StaticVector. However, I found there is a numerical instability of Base.exp with the 3*3 matrix…see the example above:
using StaticArrays
julia> deriv_mat = Float32[-50.75532 69.86392 0.0; 50.75532 -139.72784 54.483025; 0.0 69.86392 -108.96605]
3×3 Matrix{Float32}:
-50.7553 69.8639 0.0
50.7553 -139.728 54.483
0.0 69.8639 -108.966
julia> exp(deriv_mat)
3×3 Matrix{Float32}:
8.57472f-6 6.68494f-6 3.72485f-6
4.85653f-6 3.78621f-6 2.10967f-6
3.46999f-6 2.70524f-6 1.50736f-6
julia> deriv_mat_s = SMatrix{3, 3, Float32}(Float32[-50.75532 69.86392 0.0; 50.75532 -139.72784 54.483025; 0.0 69.86392 -108.96605])
3×3 SMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
-50.7553 69.8639 0.0
50.7553 -139.728 54.483
0.0 69.8639 -108.966
julia> exp(deriv_mat_s)
3×3 SMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
I need to use STRICTKY the Float32 with static matrix…so is there any way to overcome such numerical instability? if I convert it to Float64 then it creates extra allocations…
I am thinking to use something like this…@btime SMatrix{3, 3, Float32}(exp(SMatrix{3, 3, Float64}($deriv_mat_s_64))) but very ugly…
See issues #785 (which first reports the issue) and #1285 (which correctly diagnoses the problem).
This is exacerbated by the large coefficients used in the Pade approximation of StaticArrays’ version of exp. One improvement would be to scale all the coefficients down so that the largest (rather than smallest) ones are on the order of 1 (in fact, it would probably resolve most issues with overflow in exp). It wouldn’t be a very difficult pull request, just find each set of coefficients and scale them by 2^{-n} for a useful choice of n.
Not strictly in Base as you know by now, and I filed an issue just now:
but I see it’s a known problem, and seemingly the third duplicate, so I closed mine. But at least see the links to the code from mine. I didn’t move them to other issues.
As I recall, StaticArrays for 3x3 and up uses nearly the same algorithm (with a very similar implementation) as Base except without the gebal. This issue really is just the large coefficients used in the Pade (though they are the same as used in Base) combined with a specialized 3x3 \ that is really vulnerable to overflow. Shrinking the coefficients or using lu would fix this issue.
I haven’t looked into modifying the Base version to be more widely applicable. It’s extremely limited right now (see #51008). But as I recall, Julia v1.12 is decent at avoiding allocations for non-escaping MArray. With a not-small reworking of the Base version, it might be possible to use it entirely (without allocations) by passing a MArray. I think the main factor would be working around the gebal in some way, either skipping it (the algorithm “works” without it) or making a suitable Julia implementation.
thanks for your reply! But what do you mean by Shrinking the coefficients or using lu would fix this issue.?
Besides, I need to use ForwardDiff.jacobian later to calculate Jacobin of my model, it told me that simply transferring type to Float32 then back to 64 would yield error in ForwardDiff.jacobian…
Thanks but this is a dead end…I need to use ForwardDiff.jacobian later to calculate Jacobin of my model, it told me that simply transferring type to Float32 then back to 64 would yield error in ForwardDiff.jacobian…
ERROR: LoadError: TaskFailedException
nested task error: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11})
The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.
Closest candidates are:
(::Type{T})(::Real, !Matched::RoundingMode) where T<:AbstractFloat
@ Base rounding.jl:265
(::Type{T})(::T) where T<:Number
@ Core boot.jl:900
Float64(!Matched::Float32)
@ Base float.jl:341
...
Stacktrace:
[1] convert(::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11})
@ Base ./number.jl:7
[2] macro expansion
@ ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:95 [inlined]
[3] convert_ntuple
@ ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:91 [inlined]
[4] SMatrix{3, 3, Float64, 9}(x::NT)
@ StaticArraysCore ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:127
[5] SArray
@ ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:131 [inlined]
[6] StaticArray
@ ~/.julia_sindbad_clean/packages/StaticArrays/DsPgf/src/convert.jl:180 [inlined]
[7] solve_odes(new_vegW::SVector{3, Float32}, deriv_mat::SMatrix{3, 3, ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11}, 9}, deriv_const::SVector{3, ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11}}, init_cond::SVector{3, Float32}, deltaT::Float32, z_zero::Float32, helpers::NT)
The source is here. Every branch has two polynomial evaluations (most of them using @evalpoly (which should probably be changed to evalpoly) except for one which writes them out long-form to exploit the Paterson-Stockmeyer method). The coefficients of these polynomials are the S(<integer>) terms. In every polynomial pair, the smallest term is 1. The largest term depends on the branch. If each term in a polynomial pair was scaled by the same constant (ideally a power of 2 so that there is no roundoff error), the result would be mathematically the same (because \frac{a}{b} = \frac{ac}{bc}). So, for instance, the largest polynomial pair has largest coefficient 64764752532480000. Scaling every term in that polynomial pair by 2^{-55} would change this coefficient to \approx 1.8. The numbers in the calculation would be smaller and would not overflow the calculation expA = (V - U) \ (V + U).
See issue #1285 (linked above) for discussion of X \ Y versus lu(X) \ Y. Solving with lu is less prone to overflow due to a not-very-careful implementation of X \ Y and would typically resolve this issue. Although it is somewhat slower.
Personally, I think changing the coefficients should be done in any case. The lu would then rarely be necessary.
The same coefficient change could be made to Base, though it always uses lu (even though it may not be written, because that’s always how it does square \), so the issue never showed up there.