Bug in NLSolve

ufechner7 · February 15, 2026, 2:43pm

I had a very ugly bug for more than a year. Now, thanks to AI, I found the reason and could fix it.

This bug was very ugly at first, because it does not always appear, only for certain numerical values. And secondly, not on every hardware, because subnormal values are handled differently, depending on the CPU.

Root cause: NLSolversBase 7.10.0 switched from direct FiniteDiff to DifferentiationInterface for Jacobian computation. The DI backend produces subnormal values (~5.6e-305) in near-zero Jacobian columns instead of exact zeros. NLsolve’s autoscale computes column norms d[j], and when d[j] is subnormal (not zero), its guard if d[j] == 0; d[j] = 1; end doesn’t trigger. The subsequent g ./ d.^2 underflows d[j]^2 to 0.0, producing NaN that propagates through the solver.

Fix: Both KPS4.jl and KPS3.jl now provide an explicit Jacobian function (jac!) to nlsolve using manual central finite differences (matching FiniteDiff’s defaults: step = cbrt(eps)). After computing the Jacobian, entries below sqrt(floatmin) (~1.5e-154) are flushed to zero—this prevents any value whose square would underflow from reaching NLsolve’s autoscale.

To reproduce:

git clone https://github.com/OpenSourceAWE/KiteModels.jl.git
cd KiteModels.jl
git checkout bug
julia --project -e 'using Pkg; Pkg.instantiate(); Pkg.test()'

This PR demonstrates the bug: Branch to demonstrate a bug in NLSolve by ufechner7 · Pull Request #258 · OpenSourceAWE/KiteModels.jl · GitHub
And this PR provides the fix: Update NLSolve, fix NLsolve issue by ufechner7 · Pull Request #259 · OpenSourceAWE/KiteModels.jl · GitHub

The fix:

"""
    make_jac(f!, n_vars)

Create an explicit Jacobian function using central finite differences.

Works around NLSolversBase >= 7.10 producing subnormal Jacobian entries via
DifferentiationInterface, which cause NaN in NLsolve autoscale (subnormal
column norms squared underflow to zero).
"""
function make_jac(f!, n_vars)
    _F1 = zeros(SimFloat, n_vars)
    _F2 = zeros(SimFloat, n_vars)
    _xp = zeros(SimFloat, n_vars)
    _xm = zeros(SimFloat, n_vars)
    threshold = sqrt(floatmin(SimFloat))
    function jac!(J, x)
        h_factor = cbrt(eps(SimFloat))
        for j in 1:n_vars
            copyto!(_xp, x)
            copyto!(_xm, x)
            h = max(abs(x[j]), one(SimFloat)) * h_factor
            _xp[j] += h
            _xm[j] -= h
            f!(_F1, _xp)
            f!(_F2, _xm)
            @views J[:, j] .= (_F1 .- _F2) ./ (2h)
        end
        # Flush near-zero entries whose squares would underflow, preventing NaN in autoscale
        @. J = ifelse(abs(J) < threshold, zero(SimFloat), J)
    end
    return jac!
end

Where and how should I report this bug?

stevengj · February 15, 2026, 6:24pm

This seems like a bug in NLsolve — such a test is not very robust. At the very least they could change the test to d[j]^2 == 0 to check for underflow: handle underflow of norm(J[:,i])^2 by stevengj · Pull Request #296 · JuliaNLSolvers/NLsolve.jl · GitHub

gdalle · February 15, 2026, 7:38pm

Can you provide an MWE for this behavior? I might have missed a setting in the FiniteDiff bindings for DI if they don’t behave identically

ufechner7 · February 15, 2026, 7:48pm

That is difficult. Most of the time, it doesn’t happen.

I mean, the test case I shared is 100% reproducible. But the problem appears only after 40 iterations, and the model is complex, with ~ 60 states.

Oscar_Smith · February 15, 2026, 9:09pm

that looks like it might be reading uninitialized memory.

ufechner7 · February 16, 2026, 8:33am

Perhaps this MWE will do? But it is AI generated:

# SPDX-FileCopyrightText: 2025 Uwe Fechner
# SPDX-License-Identifier: MIT

# MWE: NLsolve trust-region autoscale NaN bug
#
# Bug location:
#   NLsolve/src/solvers/trust_region.jl, initial autoscale block (~line 150):
#
#       for j = 1:nn
#           cache.d[j] = norm(view(jacobian(df), :, j))
#           if cache.d[j] == zero(cache.d[j])    # ← BUG: only catches exact 0.0
#               cache.d[j] = one(cache.d[j])
#           end
#       end
#
#   Later in the dogleg step (~line 78):
#
#       g .= g ./ d.^2                            # ← NaN when d[j]^2 underflows to 0.0
#
# Root cause:
#   When a Jacobian column has very tiny entries (e.g. ~1e-305), the column norm
#   d[j] is nonzero, so the guard `d[j] == 0` does NOT trigger. But d[j]^2
#   underflows to 0.0, and the subsequent division g ./ d.^2 produces NaN.
#
#   Any value below sqrt(floatmin(Float64)) ≈ 1.49e-154 will have its square
#   underflow to a subnormal or zero. The current guard only catches exact 0.0.
#
# Context:
#   NLSolversBase 7.10.0 switched from direct FiniteDiff to DifferentiationInterface.
#   For certain real-world systems the DI backend produces tiny near-zero values
#   in Jacobian columns that should be zero, instead of exact zeros. This triggers
#   the faulty guard above.
#
# Suggested fix:
#   Replace `d[j] == zero(d[j])` with a check that also catches values whose
#   square would underflow, e.g.:
#       if d[j] <= sqrt(floatmin(eltype(cache.d)))
#           cache.d[j] = one(cache.d[j])
#       end
#
# Tested with NLsolve 4.5.1, but the bug is in all versions with this autoscale guard.

using NLsolve, LinearAlgebra

# ----- Step 1: Show the arithmetic issue -----
println("=== Floating-point underflow in d.^2 ===")
println("floatmin(Float64) = ", floatmin(Float64), "  (smallest normal Float64)")
println("sqrt(floatmin)    = ", sqrt(floatmin(Float64)))
println()
for v in [1e-100, 1e-154, 1e-155, 1e-200, 1e-305]
    sq = v^2
    println("  d = $v  →  d^2 = $sq", sq == 0.0 ? "  ← UNDERFLOWS TO ZERO" : "")
end
println()

# ----- Step 2: Trigger NaN in NLsolve -----
# A simple 2-equation system. We provide an explicit Jacobian where column 2
# has tiny entries, simulating what DifferentiationInterface produces for
# near-zero partial derivatives in real applications.

function f!(F, x)
    F[1] = x[1] - 1.0
    F[2] = x[1]^2 - 4.0
    return F
end

function j!(J, x)
    J[1, 1] = 1.0
    J[2, 1] = 2.0 * x[1]
    # Column 2: tiny values that are NOT exactly zero.
    # These are above floatmin (so they are normal Float64), but their square
    # underflows to 0.0, causing division by zero in the dogleg step.
    J[1, 2] = 1e-200
    J[2, 2] = 1e-200
    return J
end

x0 = [0.5, 0.5]
println("=== NLsolve trust_region with autoscale=true ===")
println("Starting point: ", x0)

# Verify the Jacobian column norm issue
d2 = norm([1e-200, 1e-200])
println("Column 2 norm:    d = ", d2)
println("  d == 0.0:       ", d2 == 0.0, "  (guard does NOT trigger)")
println("  d^2:            ", d2^2, "  (underflows to zero!)")
println("  1.0 / d^2:      ", 1.0 / d2^2, "  → NaN propagation")
println()

result = nlsolve(f!, j!, x0, method=:trust_region, autoscale=true)
println("Converged: ", converged(result))
println("Solution:  ", result.zero)
println("NaN in solution: ", any(isnan, result.zero))
println()

# ----- Step 3: Show exact zeros work (guard triggers correctly) -----
println("=== Same system, but with EXACT zeros in Jacobian column 2: ===")
function j_zero!(J, x)
    J[1, 1] = 1.0;  J[2, 1] = 2.0 * x[1]
    J[1, 2] = 0.0;  J[2, 2] = 0.0          # exact zeros → d[j]==0 guard triggers
    return J
end
result2 = nlsolve(f!, j_zero!, x0, method=:trust_region, autoscale=true)
println("Converged: ", converged(result2))
println("Solution:  ", result2.zero)
println("NaN: ", any(isnan, result2.zero))
println()
println("The ONLY difference is 1e-200 vs 0.0 in Jacobian column 2.")
println("With 1e-200: d[j] ≠ 0 → guard skipped → d[j]^2 underflows → NaN")
println("With 0.0:    d[j] == 0 → guard triggers → d[j] = 1 → OK")

gdalle · February 16, 2026, 9:47am

Did you run it to check whether it produces the expected behavior on your computer?

ufechner7 · February 16, 2026, 10:00am

Yes, it produces the expected result:

julia> include("mwes/mwe_nlsolve_bug.jl")
=== Floating-point underflow in d.^2 ===
floatmin(Float64) = 2.2250738585072014e-308  (smallest normal Float64)
sqrt(floatmin)    = 1.4916681462400413e-154

  d = 1.0e-100  →  d^2 = 1.0e-200
  d = 1.0e-154  →  d^2 = 1.0e-308
  d = 1.0e-155  →  d^2 = 1.0e-310
  d = 1.0e-200  →  d^2 = 0.0  ← UNDERFLOWS TO ZERO
  d = 1.0e-305  →  d^2 = 0.0  ← UNDERFLOWS TO ZERO

=== NLsolve trust_region with autoscale=true ===
Starting point: [0.5, 0.5]
Column 2 norm:    d = 1.414213562373095e-200
  d == 0.0:       false  (guard does NOT trigger)
  d^2:            0.0  (underflows to zero!)
  1.0 / d^2:      Inf  → NaN propagation

Converged: false
Solution:  [NaN, NaN]
NaN in solution: true

=== Same system, but with EXACT zeros in Jacobian column 2: ===
Converged: false
Solution:  [1.938537191316301, 0.5]
NaN: false

The ONLY difference is 1e-200 vs 0.0 in Jacobian column 2.
With 1e-200: d[j] ≠ 0 → guard skipped → d[j]^2 underflows → NaN
With 0.0:    d[j] == 0 → guard triggers → d[j] = 1 → OK

ufechner7 · February 16, 2026, 6:37pm

Could someone approve the workflow run of the PR?

github.com/JuliaNLSolvers/NLsolve.jl

handle underflow of norm(J[:,i])^2 (#296)

master ← stevengj:patch-1

opened 06:31PM - 15 Feb 26 UTC

stevengj

+3 -3

Someone mentioned on [Julia discourse](https://discourse.julialang.org/t/bug-in-…nlsolve/135660?u=stevengj) that the `g ./ d.^2` computation in `dogleg!` is failing when `d.^2` is underflowing. The code had checked for `d[i] == 0` but not for underflow. This is a simple fix that checks `d[i]^2 == 0` instead. (Also, while I was at it I changed `one` to `oneunit` at the same place in the code to be more dimensionfully correct. It doesn't affect computations with dimensionless numbers. However, if the code hasn't been tested with Unitful.jl or similar then I would assume that there are dimension bugs too.)

Topic		Replies	Views
Any member of the JuliaNLSolvers organization here? General Usage question , package , nlsolve	10	418	February 24, 2026
Julia: NLsolve providing different solution each run and sometimes returns NaN? Optimization (Mathematical)	1	741	June 29, 2020
Differences between NLsolve and Optim in solving system of equations Optimization (Mathematical)	24	3162	November 16, 2022
Domain issue when working with nlsolve New to Julia question , optimization , modelling	16	1130	May 16, 2023
UndefRefError: access to undefined reference, when I use NLsolve.jl Optimization (Mathematical)	10	1440	May 7, 2020

Bug in NLSolve

Related topics