Why does ForwardDiff.jl give me all-zero Jacobian matrix?

Xu_Shan · June 20, 2026, 4:25pm

Hi guys,

I am using one Julia-based model from https://github.com/LandEcosystems/Sindbad.jl, and tries to use ForwardDiff.jl to calculate the Jacobian of the cost function to the optimized parameters. I run the pipeline (simulation, optimization, and Jacobian calculation) on different sites (different locations). but I don’t know why ForwardDiff.jl gives me all-zero Jacobian matrix at some sites (not all sites run, but some sites…like 5 out of 40 sites in total). below are some details. Any suggestions/feedbacks would be greatly appreciated!

Thank you!

=======

I tried to calculate the parameter uncertainty by calculate the Hessian matrix (J^T*J, as the covariance matrix), then the std of the diagonal elements as the uncertainty. J is of the shape as (n_simulation, n_parameters). For example, if we optimize 10 parameters to simulate a variable of time series length 1000, then the size of J would be (1000, 10). We build a function taking input as the optimized parameter, returning the time series of simulation (size (n_parameter,1)). then use ForwardDiff.jacobian to calculate J matrix. The problem is at some sites, J is all zero…but why?

If a minimal example is required, I can try to write a pseudo code here…but due to the requirements of usage of other input data in the simulations, I am afraid a minimal reproducible example is not feasible here…

rsenne · June 20, 2026, 6:13pm

Without a reduced example it is hard to say definitively, but an all-zero ForwardDiff Jacobian usually points to one of two possibilities:

the model output is genuinely locally insensitive to those parameters at those sites, or
the ForwardDiff.Dual numbers are being dropped somewhere inside the simulation pipeline.

A useful first check is to compare ForwardDiff against a simple finite-difference perturbation at one of the problematic sites. For example, perturb one parameter at a time:

y0 = f(popt)

j = 1
ϵ = 1e-4
p1 = copy(popt)
p1[j] += ϵ

y1 = f(p1)

norm(y1 - y0) / ϵ

You could repeat this for several parameters. If these finite-difference sensitivities are also zero or extremely small, then the site may be in a flat or saturated regime, or the optimized parameters may not actually be affecting the simulated output for that site.

If the finite-difference sensitivity is nonzero but

ForwardDiff.jacobian(f, popt)

is all zeros, then I would suspect an AD issue. Common causes are parameters being converted back to Float64, copied into an Array{Float64}, assigned into fields typed as Vector{Float64}, or otherwise losing their ForwardDiff.Dual type inside the model.

A quick diagnostic is to check whether the dual type survives through the function:

J = ForwardDiff.jacobian(p -> begin
    @show eltype(p)
    y = f(p)
    @show eltype(y)
    y
end, popt)

If eltype(p) is a ForwardDiff.Dual type but eltype(y) is Float64, then derivative information is being dropped somewhere in the simulation.

I would also check for allocations like

zeros(Float64, ...)
Array{Float64}(undef, ...)

inside the simulation code. For ForwardDiff compatibility, these often need to be written generically, for example using eltype(p) or similar, so that arrays can store dual numbers.

One additional point: if you are using J'J for parameter uncertainty, make sure the scaling and statistical assumptions are appropriate. In many least-squares settings, the covariance approximation is proportional to inv(J'J), often scaled by an estimate of the residual variance, rather than just J'J itself. Also, if J is zero or nearly rank-deficient, the corresponding uncertainty estimate will be ill-conditioned or non-identifiable.

Xu_Shan · June 21, 2026, 11:11am

Thanks! If I run target function by:

julia> J_ForwardDiff = SindbadML.ForwardDiff.jacobian(p -> begin
                @show eltype(p)
                y = cost_functionFDTimeSeries(p)
                @show eltype(y)
                y
            end, popt)
eltype(p) = ForwardDiff.Dual{ForwardDiff.Tag{var"#58#59", Float32}, Float32, 11}

it returns me

julia> J_ForwardDiff = SindbadML.ForwardDiff.jacobian(p -> begin
                @show eltype(p)
                y = cost_functionFDTimeSeries(p)
                @show eltype(y)
                y
            end, popt)
eltype(p) = ForwardDiff.Dual{ForwardDiff.Tag{var"#58#59", Float32}, Float32, 11}

Xu_Shan · June 21, 2026, 11:16am

this one I was always confused…some paper said no need to scale with residual variance (I suppose would be the residual of cost function if cost function if least square like), but others are not…especially for bayesian inverted approach (please check equation (3) in BG - Assimilation of multiple datasets results in large differences in regional- to global-scale NEE and GPP budgets simulated by a terrestrial biosphere model). But of course, covariance matrix is inv(JTJ), and here I already prescribe the observation uncertainty into the J. My cost function is sum(abs2.((y .- ŷ))) / sum(abs2.((y .- mean(y)))) for most observation variable streams
but there is one term in the cost function which is not least squared, but just
mean(abs.(ŷ - y)) / (one(eltype(ŷ)) + μ_y).

rsenne · June 22, 2026, 1:47am

Xu_Shan:

it returns me

julia> J_ForwardDiff = SindbadML.ForwardDiff.jacobian(p -> begin
                @show eltype(p)
                y = cost_functionFDTimeSeries(p)
                @show eltype(y)
                y
            end, popt)
eltype(p) = ForwardDiff.Dual{ForwardDiff.Tag{var"#58#59", Float32}, Float32, 11}

Did you accidentally copy/paste the code from above instead of the output? Also, what did the finite differences reveal?

Second, regarding J'J and uncertainty: yes, the distinction matters. If your objective is a standard weighted nonlinear least-squares objective, then the local Gauss-Newton approximation to the Hessian is usually something like

H ≈ J' * W * J

and the parameter covariance is proportional to

inv(H)

possibly scaled by an estimate of residual variance, depending on whether the observation variance is already known/prescribed or estimated from the residuals.

So if you have already prescribed the observation uncertainty correctly in the residual normalization / weighting, then an additional residual-variance scaling may not be needed. But if the observation variance is not known and is being estimated from the fit, then a residual variance factor is often included.

One important subtlety is that your cost function is not purely least squares:

sum(abs2.(y .- ŷ)) / sum(abs2.(y .- mean(y))))

for some streams, but also

mean(abs.(ŷ - y)) / (one(eltype(ŷ)) + μ_y)

for another term.

The abs term is an L1-type loss, not a least-squares loss. For that term, J'J is not the usual Gauss-Newton Hessian approximation in the same way as for squared residuals. Also, abs is nondifferentiable at zero residual, so depending on the residuals and implementation, the AD derivative can be fragile or give subgradient-like behavior. That probably does not by itself explain an all-zero Jacobian for only some sites, but it does mean that interpreting inv(J'J) as a covariance matrix is less straightforward.

Xu_Shan · June 22, 2026, 7:45am

The code running takes a lot of time…FiniteDiff is also still running…for this ForwardDiff I already know the result is zero-valued Jacobian matrix from previous test runs…

For FiniteDiff, I am using 3 perturbation, then average them together for the final estimate of Jacobian: [0.5%, 1%, 2%]. It takes about 8 hours to run for one perturbation, like for 0.5%…

Topic		Replies	Views
Why does forward diff yield totally different gradients compared to my manual perturbation General Usage question , package , forwarddiff , autodiff , uncertainty-quantifi	1	146	April 10, 2025
ForwardDiff.jacobian failed to differentiate a code and gives a wrong result Numerics question , forwarddiff	8	274	January 27, 2025
How to optimize Jacobian ForwardDiff.jl Modelling & Simulations forwarddiff	2	487	March 20, 2023
Finding Jacobian using automatic differentiation for a vector function General Usage forwarddiff , reversediff	3	1784	December 14, 2021
[ForwardDiff Question]: Jacobian of a function that calls another function General Usage question , forwarddiff	6	955	July 20, 2020

Why does ForwardDiff.jl give me all-zero Jacobian matrix?

Related topics