Numeric stability of Base.exp of static matrix of Float32

Xu_Shan · October 23, 2025, 2:47pm

Guys,

I want to use Base.exp in my model which uses Float32 with StaticArray and StaticVector. However, I found there is a numerical instability of Base.exp with the 3*3 matrix…see the example above:

using StaticArrays

julia> deriv_mat = Float32[-50.75532 69.86392 0.0; 50.75532 -139.72784 54.483025; 0.0 69.86392 -108.96605]
3×3 Matrix{Float32}:
 -50.7553    69.8639     0.0
  50.7553  -139.728     54.483
   0.0       69.8639  -108.966

julia> exp(deriv_mat)
3×3 Matrix{Float32}:
 8.57472f-6  6.68494f-6  3.72485f-6
 4.85653f-6  3.78621f-6  2.10967f-6
 3.46999f-6  2.70524f-6  1.50736f-6

julia> deriv_mat_s = SMatrix{3, 3, Float32}(Float32[-50.75532 69.86392 0.0; 50.75532 -139.72784 54.483025; 0.0 69.86392 -108.96605])
3×3 SMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
 -50.7553    69.8639     0.0
  50.7553  -139.728     54.483
   0.0       69.8639  -108.966

julia> exp(deriv_mat_s)
3×3 SMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
 NaN  NaN  NaN
 NaN  NaN  NaN
 NaN  NaN  NaN

I need to use STRICTKY the Float32 with static matrix…so is there any way to overcome such numerical instability? if I convert it to Float64 then it creates extra allocations…

I am thinking to use something like this…@btime SMatrix{3, 3, Float32}(exp(SMatrix{3, 3, Float64}($deriv_mat_s_64))) but very ugly…

A3 = rand(3, 3)
As3 = SMatrix{3, 3}(A3)

julia> @btime exp($As3)
  93.400 ns (0 allocations: 0 bytes)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 1.52765   0.53686  0.282278
 0.792425  1.74997  0.504976
 1.28887   1.30304  1.93264

julia> @btime SMatrix{3, 3, Float32}(exp(SMatrix{3, 3, Float64}($deriv_mat_s_64)))
  113.689 ns (0 allocations: 0 bytes)
3×3 SMatrix{3, 3, Float32, 9} with indices SOneTo(3)×SOneTo(3):
 8.5747f-6   6.68493f-6  3.72484f-6
 4.85652f-6  3.7862f-6   2.10967f-6
 3.46999f-6  2.70524f-6  1.50736f-6

DNF · October 23, 2025, 3:38pm

I don’t know about the numerical instability, but this is less ugly:

Float32.(exp(Float64.(x)))

Benny · October 23, 2025, 3:49pm

Possibly related to [BUG] exp of SMatrix fails numerically · Issue #1295 · JuliaArrays/StaticArrays.jl, though the 2x2 example there works now.

Just checking, do you intend exponentiation of each element or the generalization to square matrices?

Xu_Shan · October 23, 2025, 3:52pm

not the element-wise exponential, but the matrix exponential.

mikmoore · October 23, 2025, 5:38pm

See issues #785 (which first reports the issue) and #1285 (which correctly diagnoses the problem).

This is exacerbated by the large coefficients used in the Pade approximation of StaticArrays’ version of exp. One improvement would be to scale all the coefficients down so that the largest (rather than smallest) ones are on the order of 1 (in fact, it would probably resolve most issues with overflow in exp). It wouldn’t be a very difficult pull request, just find each set of coefficients and scale them by 2^{-n} for a useful choice of n.

Palli · October 24, 2025, 1:09am

Not strictly in Base as you know by now, and I filed an issue just now:

github.com/JuliaArrays/StaticArrays.jl

Bug/numberical instability in exp/_exp returning NaN (for at least Float32); not in sync with Base/LinearAlgebra

opened 01:08AM - 24 Oct 25 UTC

PallHaraldsson

https://discourse.julialang.org/t/numeric-stability-of-base-exp-of-static-matrix…-of-float32/133389 See: https://github.com/JuliaArrays/StaticArrays.jl/issues/785#issuecomment-626655039 https://github.com/JuliaArrays/StaticArrays.jl/blame/a9333fa830d27b85d6c8aa284d32df1b6de0634f/src/expm.jl#L99-L101 exp in "Base" does basically: exp!(copy_similar(A, eigtype(eltype(A)))) and calls comparable method (that has changed 11 months ago and 4 years ago): https://github.com/JuliaLang/LinearAlgebra.jl/blame/8d79b34b0423e616dc2899e41299e36f5190bb82/src/dense.jl#L782C54-L782C54

but I see it’s a known problem, and seemingly the third duplicate, so I closed mine. But at least see the links to the code from mine. I didn’t move them to other issues.

Benny · October 24, 2025, 3:16am

I’m curious, what does Base do for Arrays, and is there a way to just repurpose that for StaticArrays so there’s only one real algorithm to maintain?

mikmoore · October 24, 2025, 4:13am

As I recall, StaticArrays for 3x3 and up uses nearly the same algorithm (with a very similar implementation) as Base except without the gebal. This issue really is just the large coefficients used in the Pade (though they are the same as used in Base) combined with a specialized 3x3 \ that is really vulnerable to overflow. Shrinking the coefficients or using lu would fix this issue.

I haven’t looked into modifying the Base version to be more widely applicable. It’s extremely limited right now (see #51008). But as I recall, Julia v1.12 is decent at avoiding allocations for non-escaping MArray. With a not-small reworking of the Base version, it might be possible to use it entirely (without allocations) by passing a MArray. I think the main factor would be working around the gebal in some way, either skipping it (the algorithm “works” without it) or making a suitable Julia implementation.

Xu_Shan · October 25, 2025, 9:08am

thanks for your reply! But what do you mean by Shrinking the coefficients or using lu would fix this issue.?

Besides, I need to use ForwardDiff.jacobian later to calculate Jacobin of my model, it told me that simply transferring type to Float32 then back to 64 would yield error in ForwardDiff.jacobian…

Xu_Shan · October 25, 2025, 9:09am

Thanks but this is a dead end…I need to use ForwardDiff.jacobian later to calculate Jacobin of my model, it told me that simply transferring type to Float32 then back to 64 would yield error in ForwardDiff.jacobian…

ERROR: LoadError: TaskFailedException

    nested task error: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11})
    The type `Float64` exists, but no method is defined for this combination of argument types when trying to construct it.
    
    Closest candidates are:
      (::Type{T})(::Real, !Matched::RoundingMode) where T<:AbstractFloat
       @ Base rounding.jl:265
      (::Type{T})(::T) where T<:Number
       @ Core boot.jl:900
      Float64(!Matched::Float32)
       @ Base float.jl:341
      ...
    
    Stacktrace:
      [1] convert(::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11})
        @ Base ./number.jl:7
      [2] macro expansion
        @ ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:95 [inlined]
      [3] convert_ntuple
        @ ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:91 [inlined]
      [4] SMatrix{3, 3, Float64, 9}(x::NT)
        @ StaticArraysCore ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:127
      [5] SArray
        @ ~/.julia_sindbad_clean/packages/StaticArraysCore/MoJEf/src/StaticArraysCore.jl:131 [inlined]
      [6] StaticArray
        @ ~/.julia_sindbad_clean/packages/StaticArrays/DsPgf/src/convert.jl:180 [inlined]
      [7] solve_odes(new_vegW::SVector{3, Float32}, deriv_mat::SMatrix{3, 3, ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11}, 9}, deriv_const::SVector{3, ForwardDiff.Dual{ForwardDiff.Tag{var"#34#36", Float32}, Float32, 11}}, init_cond::SVector{3, Float32}, deltaT::Float32, z_zero::Float32, helpers::NT)

Xu_Shan · October 25, 2025, 9:14am

Thanks for your reply @mikmoore, but how can we do this 2^{-n} scaling?

mikmoore · October 25, 2025, 4:36pm

The source is here. Every branch has two polynomial evaluations (most of them using @evalpoly (which should probably be changed to evalpoly) except for one which writes them out long-form to exploit the Paterson-Stockmeyer method). The coefficients of these polynomials are the S(<integer>) terms. In every polynomial pair, the smallest term is 1. The largest term depends on the branch. If each term in a polynomial pair was scaled by the same constant (ideally a power of 2 so that there is no roundoff error), the result would be mathematically the same (because \frac{a}{b} = \frac{ac}{bc}). So, for instance, the largest polynomial pair has largest coefficient 64764752532480000. Scaling every term in that polynomial pair by 2^{-55} would change this coefficient to \approx 1.8. The numbers in the calculation would be smaller and would not overflow the calculation expA = (V - U) \ (V + U).

See issue #1285 (linked above) for discussion of X \ Y versus lu(X) \ Y. Solving with lu is less prone to overflow due to a not-very-careful implementation of X \ Y and would typically resolve this issue. Although it is somewhat slower.

Personally, I think changing the coefficients should be done in any case. The lu would then rarely be necessary.

The same coefficient change could be made to Base, though it always uses lu (even though it may not be written, because that’s always how it does square \), so the issue never showed up there.

DNF · October 26, 2025, 7:46pm

Sure. The main point was to show that it’s better to write

Float64.(x)

than

SMatrix{3, 3, Float64}(x)

I dislike needless verbosity.

Topic		Replies	Views
Matrix exponential slower in Julia 0.7 / 1.0? Performance	7	2686	September 12, 2018
Linear algebra performance issue General Usage	20	1159	July 10, 2020
Performance of mutable static arrays and compilation cost General Usage	15	1257	July 17, 2018
Great performance hit on (simple) functions General Usage question	10	1407	November 7, 2016
Type stability issues when using StaticArrays Performance type-stability , staticarrays	24	1927	January 3, 2019

Numeric stability of Base.exp of static matrix of Float32

Related topics