Why does forward diff yield totally different gradients compared to my manual perturbation

Xu_Shan · April 10, 2025, 10:34am

Guys, I am trying to use ForwardDiff.jl to calculate the gradients (Jacobians) of the model cost function to parameters (at the values of a “pre-set” local_optim). I found ForwardDiff.jl yields a totally different Jacobian value compared to the way that I perturb the parameters one by one (fix the perturbing parameters within +/- 3% bounds while other parameters at the values of local_optim). Why is that? I think +/-3% is already a very small perturbation…

Here is a table to illustrate this difference:

Here is my way to calculate manually perturbed Jacobians:

using DataFrames, GLM

jacobian_vector = Float64[]
param_names = String[]
# for row in eachrow(tbl_params)
for i in 1:length(tbl_params)
    row = tbl_params[i]
    param_name = Symbol(row.name)
    default_val = row.default
    optim_val   = row.optim
    lower_bound = row.lower
    upper_bound = row.upper

    # ±10% range around the optimized value, clipped to bounds
    delta = 0.03 * optim_val
    min_val = max(optim_val - delta, lower_bound)
    max_val = min(optim_val + delta, upper_bound)

    # Generate 10 points within the clipped range
    range_vals = collect(range(min_val, stop=max_val, length=10))
    # Create DataFrame to hold (param value, cost)
    df = DataFrame(param=Float64[], cost=Float64[])

    for val in range_vals
        # Update parameter vector
        tbl_params_updated = deepcopy(tbl_params)
        idx = findall(tbl_params_updated.name .== param_name)
        # Get full param vector for the model
        param_vec = deepcopy(local_optim_param)
        param_vec[idx]   .= val
        # Evaluate cost function
        # cost_val = cost_functionFD(Float32.(param_vec))
        cost_val = cost_function(Float32.(param_vec)./ tbl_params.default)
        push!(df, (param=val, cost=cost_val))
    end

    # Linear regression: cost ~ param
    lm_model = lm(@formula(cost ~ param), df)
    slope = coef(lm_model)[2]  # ∂cost / ∂parameter
    push!(jacobian_vector, slope)
    push!(param_names, String(param_name))
    println("Finished: $param_name → dCost/dParam ≈ $slope")
end

J_manual = DataFrame(param=param_names, dcost=jacobian_vector);
J_manual_norm = DataFrame(name=param_names, 
                            local_optim=local_optim_param,
                            dcost=jacobian_vector,
                            norm_j=jacobian_vector .* local_optim_param);

And my way to calculate Jacobians based on ForwardDiff.jl

# calculate Jacobian by ForwardDiff
using ForwardDiff
∇loss(x) = ForwardDiff.gradient(cost_functionFD, x)
# local_optim_param = out_synthetic.inverted_params;
@info cost_functionFD(Float32.(local_optim_param))
@time ∇loss(Float32.(local_optim_param))

# normalized Jacobians:
J_local_optim = ∇loss(Float32.(local_optim_param));
J_local_optim_norm = J_local_optim .* Float32.(local_optim_param);

J_total = DataFrame(name=param_names, 
                            local_optim=local_optim_param,
                            J_ForwardDiff=J_local_optim,
                            J_ForwardDiff_norm=J_local_optim_norm,
                            J_ManualPerturb=jacobian_vector,
                            J_ManualPerturb_norm=jacobian_vector .* local_optim_param);

and my cost function: cost_functionFD (cost_function yields same results but different data type compared to this function, so they are similar…):

function lossSiteFD(new_params, models, loc_forcing, loc_spinup_forcing,
    loc_forcing_t, loc_output, land_init, param_to_index, loc_obs, loc_cost_options, constraint_method, tem)

    new_models = updateModelParameters(param_to_index, models, new_params)

    out_data = SindbadML.getOutputFromCache(loc_output, new_params, ForwardDiffGrad())

    coreTEM!(new_models, loc_forcing, loc_spinup_forcing, loc_forcing_t, out_data, land_init, tem...)
    lossVec = metricVector(out_data, loc_obs, loc_cost_options)
    t_loss = combineMetric(lossVec, constraint_method)
    return t_loss
end
local_optim_param = out_synthetic.inverted_params;
selected_models = info.models.forward;
param_to_index = getParameterIndices(selected_models, tbl_params);
loc_cost_options = prepCostOptions(observations, info.optimization.cost_options, optimization_cost_method);
constraint_method = info.optimization.multi_constraint_method;
tem = (;
    tem_info = run_helpers.tem_info,
);
lossSiteFD(Float32.(local_optim_param),
            info.models.forward, run_helpers.space_forcing[1],
            run_helpers.space_spinup_forcing[1], run_helpers.loc_forcing_t,
            SindbadML.getCacheFromOutput(run_helpers.space_output[1], ForwardDiffGrad()),
            run_helpers.loc_land,
            param_to_index,
            observations,
            loc_cost_options,
            constraint_method,
            tem
            )
cost_functionFD = x -> lossSiteFD(x,
                                    info.models.forward, run_helpers.space_forcing[1],
                                    run_helpers.space_spinup_forcing[1], run_helpers.loc_forcing_t,
                                    SindbadML.getCacheFromOutput(run_helpers.space_output[1], ForwardDiffGrad()),
                                    run_helpers.loc_land,
                                    param_to_index,
                                    observations,
                                    loc_cost_options,
                                    constraint_method,
                                    tem)

======================
Another question, why there are so many zeros in my Jacobians? so that J' * J is singular…in this sense, my original way to calculate uncertainty (using J’*J to approximate covariance of parameters like shown in optimization - Parameter uncertainty after non-linear least squares estimation - Cross Validated) does not work… how to calculate parameter uncertainty then? Thanks!

stevengj · April 10, 2025, 12:52pm

Linear regression is not a good way to estimate derivatives, because the deviations from a straight line over a finite range are systematic, not random. A better way is finite differences with Richardson extrapolation, e.g. implemented in FiniteDifferences.jl.

Topic		Replies	Views
Help resolving perturbation confusion in `ForwardDiff.jl` with manual higher-order derivatives General Usage autodiff	0	83	June 6, 2024
What functions might break ForwardDiff.jl? General Usage forwarddiff	10	2196	August 11, 2021
Gradient using ForwardDiff.jl not working General Usage forwarddiff , autodiff , gradient	2	68	November 28, 2024
Incorrect jacobian using ForwardDiff and sparse functions Numerics differentiation , forwarddiff	4	718	April 23, 2022
ForwardDiff.jl and its Dual Type question General Usage package , forwarddiff	4	1070	August 17, 2020

Why does forward diff yield totally different gradients compared to my manual perturbation

Related topics