Least squares fitting that accounts for covariance matrix of all input parameters


I am thinking about writing a package that implements a generalized least squares fitting method or perhaps seek to contribute somewhere where it might fit. Before I write more than my prototype, I wanted to double check that I am not overlooking some already existing project.

The algorithm that I want to implement (or have implemented :slight_smile:) is described in this paper:

The important feature is that it can handle covariance in all input parameters. This allows, for example, to fit a curve to measurement data with uncertainties in x- and y-values AND have correlations in uncertainties accounted for. This can be of relevance in calibration, where a certain standard is used several times (e.g. a standard solution that is diluted several times). The covariance matrix of the ouput parameters is a natural byproduct of the algorithm. Also, the algorithm does not really differentiate between “x” and “y” values, it knows only measured values and constrain functions (which can be used to implement a curve fit) so it is more versatile than just a method for curve fitting.

To find potential candidates for packages that implement a least squares fit like the described method I was looking through github, juliahub.com, this discourse and other websites but had no luck. LsqFit.jl has a section in the docs describing how to take care of correlations in the y-errors, but its does not seem to work (see this PR). But even if it did, it would only be half the solution, since it would not account for uncertainties in x and would be limited to curve fitting.

Another package that showed up was TotalLeastSquares.jl. It handles the full covariance matrix of input parameters but seems to be limited to linear models and I don’t know how one would get the covariance matrix of the output parameters (although it is perhaps possible and maybe not even that hard, I am just missing the mathematical background).

Then of course there is JuMP, but it does not seem to handle uncertainty at all, or at least it is not obvious to me how it does so.

OK, before this post gets even longer:
Is there some project that already provides the functionality I described above, or would that be a useful addition to the Julia toolbox that is currently missing?



From a quick look at the methodology section, the paper seems to describe the classical Kriging method? Perhaps the difference is that they are interested in non-linear constraints? If the constraints are linear, then you can use KrigingEstimators.jl directly, without gradient-based methods.

This package is part of the larger GeoStats.jl effort, including interpolations with spatial covariances.

1 Like

You can get generalized least squares fitting by using a general minimization package like Optim.jl by minimizing a chi square defined as (m_i - p_i) V^-1_ij (m_j - p_j) with m_i = measurement i, p_i = fit parameter, V = variance matrix of m_i. I see that the package GitHub - gragusa/CsminWel.jl: Optimization in Julia returns the inverse of the Hessian at the minimum, which is the variance of the fit parameters. Another option could be GitHub - fkguo/IMinuit.jl: A Julia wrapper of iminuit. iminuit is a Python interface to the C++ version of the function minimization and error analysis package MINUIT.. However if, like in the paper you mention, you want generalized least squares with a set of constraints, I don’t know of any Julia package that does equality-constrained minimization. If the constraints can be solved to provide the measurement predictions p_i as functions of another set of fit parameters q_k, then the problem can be again solved my minimizing the same chi square above with q_k being the fit parameters. This not however always possible.

I will be very interested in a Julia package that implements the algorithm of the paper.

I am using constrained minimization as described here for my research activity, HFLAV-Tau 2018 - Branching fraction fit.

1 Like

Thanks for pointing me to this method—I have never heard of it so far (my background is in physical chemistry and metrology). I would not be surprised if the solution is something from a different field that uses a different terminology that I am unfamiliar with and thus overlooked it.

Wikipedia describes Kriging as kind of a Gaussian process interpolation. I did not see the immediate connection yet, but that comes perhaps with looking at the underlying math more closely. I will give it a go.

You are right, using some of the available packages that perform optimization to minimize the chi-square function given in the paper may be a good solution. Accounting for the constraint functions is a must, however.

JuMP does allow to minimize a cost function subject to some constraints, so I guess I have to look more closely there.

With Julia I found it rather easy to write the code that does the fitting (more or less just copy the equations …):

# this seems to work, though it may be a bit naive ...
using Zygote

function getD(func, β, ζ, Σ)
    Nf = func(β, ζ) |> length
    Nβ = length(β)
    Nζ = length(ζ)

    ∇fβ, ∇fζ = jacobian(func, β, ζ)

    return [
        zeros(Nβ, Nβ) zeros(Nβ, Nζ)      transpose(∇fβ);
        zeros(Nζ, Nβ)        inv(Σ)      transpose(∇fζ);
                  ∇fβ           ∇fζ       zeros(Nf, Nf)

function lsq(func, βₗ, z, Σ; iter_max=100, tol=1e-8)
    ζₗ = copy(z)

    Nf = func(βₗ, ζₗ) |> length
    Nβ = length(βₗ)
    Nζ = length(ζₗ)

    vₗ = [βₗ; ζₗ; zeros(Nf)] # iteratively updated vector
    # previous chi-square
    χ²ₗ₋₁ = nothing
    for l in 1:iter_max
        D = getD(func, βₗ, ζₗ, Σ)
        f = func(βₗ, ζₗ)
        vₗ += D \ [zeros(Nβ); Σ\(z - ζₗ); -f]
        βₗ = vₗ[1:Nβ]
        ζₗ = vₗ[Nβ+1:Nβ+Nζ]
        χ² = transpose(z - ζₗ)*inv(Σ)*(z - ζₗ)
        println("iteration $l: χ² = $(χ²)")
        if !isnothing(χ²ₗ₋₁) && abs((χ²ₗ₋₁ - χ²)/χ²) < tol
            χ²ₗ₋₁ = χ²
    return βₗ, ζₗ, inv(getD(func, βₗ, ζₗ, Σ))

but it is somewhat inconvenient to write the constraint functions (models) because you have to keep track of all the indices of the measured parameters and fitted parameters. Making that convenient, e.g. through a macro, would be the more difficult task when making a new package. This is where I would like to not “reinvent the wheel”.

Thanks for pointing me to the CERN report, that seems to be more or less exactly what I am looking for. Is the code available somewhere and how hard would it be to write models for it?

Sorry, my last post should have been a reply to @alusiani

Have you checked TotalLeastSquares.jl?

One of the references cited is:
Fang, X. (2013). Weighted total least squares: necessary and sufficient conditions, fixed and random parameters. Journal of Geodesy , 87(8), pp.733–749.

Nice code!

In the code I wrote, 90% of the code deals with trivial issues like reading data, storing data, indexing them, printing them, and 10% or less does the actual minimization. I guess that this cannot be avoided unfortunately.

I can give you the whole code, it was on github at some point, and I can put it back there. The minimization procedure code is quite simple, it is in the R language, is not general and deals with just one specific fit, and is not well documented. I’d love to have more time and make of it a package both in R and Julia but I couldn’t find it.

As described in the CERN report, my procedure uses the linear algebra solution of chi square minimization with a series of linear constraints. Non-linear constraints are linearized around seed values at the beginning and in later steps around the fit values of the current iteration, until the fit result gets stable.

I found a nice numeric equality-constrained minimization (with non-linear constraints) in the R package CRAN - Package Rsolnp, which uses the procedures of the matlab package solnp documented here Matlab Programs for Optimization. This R package was quite the best I found for performance, the procedure is cleverly designed.



Since minuit was mentioned here: I have the beginnings of a wrapper of the C++ version of minuit working in this package GitHub - jstrube/Minuit2.jl

The tests show you how to use it, but not all features are implemented yet, and I don’t have the time to take it further. I believe the hard part of writing the interface is done, so if anybody has a strong use case, please feel free to take it from there.

I checked TotalLeastSquares.jl, but I didn’t see how to make it work for a non-linear measurement model. I missed the reference though, thanks for pointing that out.

Thanks for the kind offer—unfortunately I have no experience with R, so I am not sure how well I could follow the code. Would not hurt to try, though …

Another good reference, thank you!

I started writing a notebook with a minimum working example by had to stop because of deadlines. I will eventually complete it. In the meantime, here is the file that contains the non-linear-equality-constrained minimization: alucomb2-fit.r · GitHub . The code is not nice and elegant but it is very well tested for my specific usage. It contains several minimization options, my one uses the code beginning with

  if (method == "alucomb2") {

The relevant code is embedded in a lot of peripheral code. There is also some preconditioning of the linear equations, which otherwise tend to be computationally singular. For efficiency, linear and non-linear constraints are treated differently, but in general the are all stated as strings containing symbolic expressions, which are parsed in R expressions and differentiated. Something similar, probably more elegant, can be done in Julia.




Here I attach a jpg from document, where we show the development of the article by Lars Nielsen “Evaluation of Measurements by the Method of Least Squares” It is developed in Maple. I would be very happy to see it developed in Julia


Thank you for posting this, there is a lot of useful information in there. I could not follow it completely, due to my lacking knowledge of R, but there are some things that caught my attention:

# line 632
meas.corr = diag.m(rep(1, meas.num))
rownames(meas.corr) = meas.names
colnames(meas.corr) = meas.names
meas.corr.stat = meas.corr

Looks like you can have a named covariance matrix where you can index by parameter name. That would be very handy and much less of a hassle then working with bare indices for a potential Julia version. NamedArrays.jl might help here.

# line 671
cat("\nerror: asymmetric statistical correlation\n  ", paste(errors, collapse="\n  "), "\n", sep="")

Does this mean that the covariance/correlation matrix may become asymmetric? How would that happen?

# line 746
sfact.val[meas.scale.names] = quant.cards.sfact
meas.corr = meas.corr / (sfact.val %o% sfact.val)
diag(meas.corr) = 1
meas.err = meas.err * sfact.val


# line 955
##--- to avoid computationally singular matrix, apply proper factor to constraint equations
constr.scaled.m = constr.m * quant.invcov.order/constr.m.order
constr.scaled.v = constr.v * quant.invcov.order/constr.m.order

Looks like you re-scale both, your parameters and the constraint functions as a whole, to avoid problematic matrices.
I thought it would be enough to re-scale some “extreme” parameters (those that are on a completely different order of magnitude than the remaining parameters) and leave everything else alone. I just tried re-scaling a constrain in my test code, even by an extreme value (1e30) just to see if anything breaks, but to my surprise, nothing happened. I though it would put more weight on the constrain and ruin the derivatives and thus the derived uncertainties.

# line 813
## preliminary computations for analytical minimum chi-square solution

Not sure if this is what happens, but my guess is that a regular least squares fit is performed first to get a better guess for the unknown parameters. That is a good idea to refine the guess and have less iterations in the more costly generalized least square algorithm.

# line 267
##+++ improve, must check if the two linear combinations are degenerate
##+++ in general, should check if constraints are linearly independent

I was watching the JuliaCon 2021 Workshop on ModelingToolkit.jl and was wondering if one could not formulate this particular least squares method with it. ModelingToolkit takes the model as symbolic equations and tries to “optimize” the equations before numerically solving them. Thus, I would expect it to be able to detect and remove redundant constraints of a model.

I was playing shortly with JuMP and ModelingToolkit, but had no luck so far. Both seem to dislike the matrix equation in the chi-square function, but maybe I did something wrong. I have to take a closer look.

I also tried to write my own macro to be able to declare models without explicitly keeping track of the indices in the beta and zeta vectors, but I struggled quite a bit with the metaprogramming and am not sure how reliable the result is :slight_smile: I would still rather use an already existing solution, such as symbolic definition in ModelingToolkit.

Thanks for the Maple code, @HerAdri :wave:
In Maple you put in the constraints directly as symbolic expressions, which could also be a good way for Julia (instead of parsing a Julia expression via a macro). I’ll check if ModelingToolkit (see my last post) or perhaps the underlying Symbolcs.jl can help here.


That is indeed a nice feature of R, especially useful for testing and debugging. I have used a bit NamedArrays.jl, it appears very well done. In other occasions, one can use Dicts from a symbol to an integer array index.

You don’t have to worry about that check. I check for symmetry because in the whole library I assemble the correlation matrix by reading cards of measurements, and with improper input cards an asymmetric correlation may result.

Also this part of the code needs the context to be understood.

The first scaling (quant.cards.sfact) serves to correct measurement uncertainties to account for the observed dispersion of different measurements. So it has nothing to do with the minimization procedure.

The second scaling instead serves to reduce the dynamic range of the values of the singular value decomposition of the matrix that represents the system of linear equations to be solved.

The dynamic range in R has to be at most 10^8 or 1/sqrt(numeric floating point precision) to permit matrix inversion. I do not rescale the equations based on the inverse of measurements covariance (I am not sure that there might be a way to do that BTW), but I compute the median scale of the related equations and then rescale the equations that involve the constrains and lagrangue multipliers to match that scale.

Note that I use a way to solve the minimization that may be non-optimal. A different path could be to use the linearized constraint equations to get a part of the fit parameters as a function of a minimum set, and then solve a smaller number of linear equations for just that smaller set of fit parameters. It is unclear to me what are actually the tradeoffs, it is a topic rarely discussed, but this second solution requires the inversion of a smaller matrix.

There I just compute the quantities that will not change in the incoming iterative procedure, there constraints are iteratively linearized.

ModelingToolkit appears quite interesting but I had no time to experiment with it. My impression is that it can solve the problem I am solving with my code. Optimizing non-linear equations seems a tough job to be done in a general way, as recognizing redundant constraints. When linearized, however, the check for redundancy (within numerical precision) is easy. But it takes quote some work to then trace and report the offending list of constraints, something still on my todo list.