I had a very ugly bug for more than a year. Now, thanks to AI, I found the reason and could fix it.
This bug was very ugly at first, because it does not always appear, only for certain numerical values. And secondly, not on every hardware, because subnormal values are handled differently, depending on the CPU.
Root cause: NLSolversBase 7.10.0 switched from direct FiniteDiff to DifferentiationInterface for Jacobian computation. The DI backend produces subnormal values (~5.6e-305) in near-zero Jacobian columns instead of exact zeros. NLsolve’s autoscale computes column norms d[j], and when d[j] is subnormal (not zero), its guard if d[j] == 0; d[j] = 1; end doesn’t trigger. The subsequent g ./ d.^2 underflows d[j]^2 to 0.0, producing NaN that propagates through the solver.
Fix: Both KPS4.jl and KPS3.jl now provide an explicit Jacobian function (jac!) to nlsolve using manual central finite differences (matching FiniteDiff’s defaults: step = cbrt(eps)). After computing the Jacobian, entries below sqrt(floatmin) (~1.5e-154) are flushed to zero—this prevents any value whose square would underflow from reaching NLsolve’s autoscale.
To reproduce:
git clone https://github.com/OpenSourceAWE/KiteModels.jl.git
cd KiteModels.jl
git checkout bug
julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.test()'
This PR demonstrates the bug: Branch to demonstrate a bug in NLSolve by ufechner7 · Pull Request #258 · OpenSourceAWE/KiteModels.jl · GitHub
And this PR provides the fix: Update NLSolve, fix NLsolve issue by ufechner7 · Pull Request #259 · OpenSourceAWE/KiteModels.jl · GitHub
The fix:
"""
make_jac(f!, n_vars)
Create an explicit Jacobian function using central finite differences.
Works around NLSolversBase >= 7.10 producing subnormal Jacobian entries via
DifferentiationInterface, which cause NaN in NLsolve autoscale (subnormal
column norms squared underflow to zero).
"""
function make_jac(f!, n_vars)
_F1 = zeros(SimFloat, n_vars)
_F2 = zeros(SimFloat, n_vars)
_xp = zeros(SimFloat, n_vars)
_xm = zeros(SimFloat, n_vars)
threshold = sqrt(floatmin(SimFloat))
function jac!(J, x)
h_factor = cbrt(eps(SimFloat))
for j in 1:n_vars
copyto!(_xp, x)
copyto!(_xm, x)
h = max(abs(x[j]), one(SimFloat)) * h_factor
_xp[j] += h
_xm[j] -= h
f!(_F1, _xp)
f!(_F2, _xm)
@views J[:, j] .= (_F1 .- _F2) ./ (2h)
end
# Flush near-zero entries whose squares would underflow, preventing NaN in autoscale
@. J = ifelse(abs(J) < threshold, zero(SimFloat), J)
end
return jac!
end
Where and how should I report this bug?