Least squares fitting that accounts for covariance matrix of all input parameters


I am thinking about writing a package that implements a generalized least squares fitting method or perhaps seek to contribute somewhere where it might fit. Before I write more than my prototype, I wanted to double check that I am not overlooking some already existing project.

The algorithm that I want to implement (or have implemented :slight_smile:) is described in this paper:


The important feature is that it can handle covariance in all input parameters. This allows, for example, to fit a curve to measurement data with uncertainties in x- and y-values AND have correlations in uncertainties accounted for. This can be of relevance in calibration, where a certain standard is used several times (e.g. a standard solution that is diluted several times). The covariance matrix of the ouput parameters is a natural byproduct of the algorithm. Also, the algorithm does not really differentiate between “x” and “y” values, it knows only measured values and constrain functions (which can be used to implement a curve fit) so it is more versatile than just a method for curve fitting.

To find potential candidates for packages that implement a least squares fit like the described method I was looking through github, juliahub.com, this discourse and other websites but had no luck. LsqFit.jl has a section in the docs describing how to take care of correlations in the y-errors, but its does not seem to work (see this PR). But even if it did, it would only be half the solution, since it would not account for uncertainties in x and would be limited to curve fitting.

Another package that showed up was TotalLeastSquares.jl. It handles the full covariance matrix of input parameters but seems to be limited to linear models and I don’t know how one would get the covariance matrix of the output parameters (although it is perhaps possible and maybe not even that hard, I am just missing the mathematical background).

Then of course there is JuMP, but it does not seem to handle uncertainty at all, or at least it is not obvious to me how it does so.

OK, before this post gets even longer:
Is there some project that already provides the functionality I described above, or would that be a useful addition to the Julia toolbox that is currently missing?



From a quick look at the methodology section, the paper seems to describe the classical Kriging method? Perhaps the difference is that they are interested in non-linear constraints? If the constraints are linear, then you can use KrigingEstimators.jl directly, without gradient-based methods.

This package is part of the larger GeoStats.jl effort, including interpolations with spatial covariances.


You can get generalized least squares fitting by using a general minimization package like Optim.jl by minimizing a chi square defined as (m_i - p_i) V^-1_ij (m_j - p_j) with m_i = measurement i, p_i = fit parameter, V = variance matrix of m_i. I see that the package GitHub - gragusa/CsminWel.jl: Optimization in Julia returns the inverse of the Hessian at the minimum, which is the variance of the fit parameters. Another option could be GitHub - fkguo/IMinuit.jl: A Julia wrapper of iminuit. iminuit is a Python interface to the C++ version of the function minimization and error analysis package MINUIT.. However if, like in the paper you mention, you want generalized least squares with a set of constraints, I don’t know of any Julia package that does equality-constrained minimization. If the constraints can be solved to provide the measurement predictions p_i as functions of another set of fit parameters q_k, then the problem can be again solved my minimizing the same chi square above with q_k being the fit parameters. This not however always possible.

I will be very interested in a Julia package that implements the algorithm of the paper.

I am using constrained minimization as described here for my research activity, HFLAV-Tau 2018 - Branching fraction fit.

1 Like

Thanks for pointing me to this method—I have never heard of it so far (my background is in physical chemistry and metrology). I would not be surprised if the solution is something from a different field that uses a different terminology that I am unfamiliar with and thus overlooked it.

Wikipedia describes Kriging as kind of a Gaussian process interpolation. I did not see the immediate connection yet, but that comes perhaps with looking at the underlying math more closely. I will give it a go.

You are right, using some of the available packages that perform optimization to minimize the chi-square function given in the paper may be a good solution. Accounting for the constraint functions is a must, however.

JuMP does allow to minimize a cost function subject to some constraints, so I guess I have to look more closely there.

With Julia I found it rather easy to write the code that does the fitting (more or less just copy the equations …):

# this seems to work, though it may be a bit naive ...
using Zygote

function getD(func, β, ζ, Σ)
    Nf = func(β, ζ) |> length
    Nβ = length(β)
    Nζ = length(ζ)

    ∇fβ, ∇fζ = jacobian(func, β, ζ)

    return [
        zeros(Nβ, Nβ) zeros(Nβ, Nζ)      transpose(∇fβ);
        zeros(Nζ, Nβ)        inv(Σ)      transpose(∇fζ);
                  ∇fβ           ∇fζ       zeros(Nf, Nf)

function lsq(func, βₗ, z, Σ; iter_max=100, tol=1e-8)
    ζₗ = copy(z)

    Nf = func(βₗ, ζₗ) |> length
    Nβ = length(βₗ)
    Nζ = length(ζₗ)

    vₗ = [βₗ; ζₗ; zeros(Nf)] # iteratively updated vector
    # previous chi-square
    χ²ₗ₋₁ = nothing
    for l in 1:iter_max
        D = getD(func, βₗ, ζₗ, Σ)
        f = func(βₗ, ζₗ)
        vₗ += D \ [zeros(Nβ); Σ\(z - ζₗ); -f]
        βₗ = vₗ[1:Nβ]
        ζₗ = vₗ[Nβ+1:Nβ+Nζ]
        χ² = transpose(z - ζₗ)*inv(Σ)*(z - ζₗ)
        println("iteration $l: χ² = $(χ²)")
        if !isnothing(χ²ₗ₋₁) && abs((χ²ₗ₋₁ - χ²)/χ²) < tol
            χ²ₗ₋₁ = χ²
    return βₗ, ζₗ, inv(getD(func, βₗ, ζₗ, Σ))

but it is somewhat inconvenient to write the constraint functions (models) because you have to keep track of all the indices of the measured parameters and fitted parameters. Making that convenient, e.g. through a macro, would be the more difficult task when making a new package. This is where I would like to not “reinvent the wheel”.

Thanks for pointing me to the CERN report, that seems to be more or less exactly what I am looking for. Is the code available somewhere and how hard would it be to write models for it?

Sorry, my last post should have been a reply to @alusiani

Have you checked TotalLeastSquares.jl?

One of the references cited is:
Fang, X. (2013). Weighted total least squares: necessary and sufficient conditions, fixed and random parameters. Journal of Geodesy , 87(8), pp.733–749.

Nice code!

In the code I wrote, 90% of the code deals with trivial issues like reading data, storing data, indexing them, printing them, and 10% or less does the actual minimization. I guess that this cannot be avoided unfortunately.

I can give you the whole code, it was on github at some point, and I can put it back there. The minimization procedure code is quite simple, it is in the R language, is not general and deals with just one specific fit, and is not well documented. I’d love to have more time and make of it a package both in R and Julia but I couldn’t find it.

As described in the CERN report, my procedure uses the linear algebra solution of chi square minimization with a series of linear constraints. Non-linear constraints are linearized around seed values at the beginning and in later steps around the fit values of the current iteration, until the fit result gets stable.

I found a nice numeric equality-constrained minimization (with non-linear constraints) in the R package CRAN - Package Rsolnp, which uses the procedures of the matlab package solnp documented here Matlab Programs for Optimization. This R package was quite the best I found for performance, the procedure is cleverly designed.



Since minuit was mentioned here: I have the beginnings of a wrapper of the C++ version of minuit working in this package https://github.com/jstrube/Minuit2.jl

The tests show you how to use it, but not all features are implemented yet, and I don’t have the time to take it further. I believe the hard part of writing the interface is done, so if anybody has a strong use case, please feel free to take it from there.

I checked TotalLeastSquares.jl, but I didn’t see how to make it work for a non-linear measurement model. I missed the reference though, thanks for pointing that out.

Thanks for the kind offer—unfortunately I have no experience with R, so I am not sure how well I could follow the code. Would not hurt to try, though …

Another good reference, thank you!

I started writing a notebook with a minimum working example by had to stop because of deadlines. I will eventually complete it. In the meantime, here is the file that contains the non-linear-equality-constrained minimization: https://gist.github.com/alusiani/420d2a976d756b6b35f16acc1e86799c . The code is not nice and elegant but it is very well tested for my specific usage. It contains several minimization options, my one uses the code beginning with

  if (method == "alucomb2") {

The relevant code is embedded in a lot of peripheral code. There is also some preconditioning of the linear equations, which otherwise tend to be computationally singular. For efficiency, linear and non-linear constraints are treated differently, but in general the are all stated as strings containing symbolic expressions, which are parsed in R expressions and differentiated. Something similar, probably more elegant, can be done in Julia.




Here I attach a jpg from document, where we show the development of the article by Lars Nielsen “Evaluation of Measurements by the Method of Least Squares” It is developed in Maple. I would be very happy to see it developed in Julia


Thank you for posting this, there is a lot of useful information in there. I could not follow it completely, due to my lacking knowledge of R, but there are some things that caught my attention:

# line 632
meas.corr = diag.m(rep(1, meas.num))
rownames(meas.corr) = meas.names
colnames(meas.corr) = meas.names
meas.corr.stat = meas.corr

Looks like you can have a named covariance matrix where you can index by parameter name. That would be very handy and much less of a hassle then working with bare indices for a potential Julia version. NamedArrays.jl might help here.

# line 671
cat("\nerror: asymmetric statistical correlation\n  ", paste(errors, collapse="\n  "), "\n", sep="")

Does this mean that the covariance/correlation matrix may become asymmetric? How would that happen?

# line 746
sfact.val[meas.scale.names] = quant.cards.sfact
meas.corr = meas.corr / (sfact.val %o% sfact.val)
diag(meas.corr) = 1
meas.err = meas.err * sfact.val


# line 955
##--- to avoid computationally singular matrix, apply proper factor to constraint equations
constr.scaled.m = constr.m * quant.invcov.order/constr.m.order
constr.scaled.v = constr.v * quant.invcov.order/constr.m.order

Looks like you re-scale both, your parameters and the constraint functions as a whole, to avoid problematic matrices.
I thought it would be enough to re-scale some “extreme” parameters (those that are on a completely different order of magnitude than the remaining parameters) and leave everything else alone. I just tried re-scaling a constrain in my test code, even by an extreme value (1e30) just to see if anything breaks, but to my surprise, nothing happened. I though it would put more weight on the constrain and ruin the derivatives and thus the derived uncertainties.

# line 813
## preliminary computations for analytical minimum chi-square solution

Not sure if this is what happens, but my guess is that a regular least squares fit is performed first to get a better guess for the unknown parameters. That is a good idea to refine the guess and have less iterations in the more costly generalized least square algorithm.

# line 267
##+++ improve, must check if the two linear combinations are degenerate
##+++ in general, should check if constraints are linearly independent

I was watching the JuliaCon 2021 Workshop on ModelingToolkit.jl and was wondering if one could not formulate this particular least squares method with it. ModelingToolkit takes the model as symbolic equations and tries to “optimize” the equations before numerically solving them. Thus, I would expect it to be able to detect and remove redundant constraints of a model.

I was playing shortly with JuMP and ModelingToolkit, but had no luck so far. Both seem to dislike the matrix equation in the chi-square function, but maybe I did something wrong. I have to take a closer look.

I also tried to write my own macro to be able to declare models without explicitly keeping track of the indices in the beta and zeta vectors, but I struggled quite a bit with the metaprogramming and am not sure how reliable the result is :slight_smile: I would still rather use an already existing solution, such as symbolic definition in ModelingToolkit.

Thanks for the Maple code, @HerAdri :wave:
In Maple you put in the constraints directly as symbolic expressions, which could also be a good way for Julia (instead of parsing a Julia expression via a macro). I’ll check if ModelingToolkit (see my last post) or perhaps the underlying Symbolcs.jl can help here.


That is indeed a nice feature of R, especially useful for testing and debugging. I have used a bit NamedArrays.jl, it appears very well done. In other occasions, one can use Dicts from a symbol to an integer array index.

You don’t have to worry about that check. I check for symmetry because in the whole library I assemble the correlation matrix by reading cards of measurements, and with improper input cards an asymmetric correlation may result.

Also this part of the code needs the context to be understood.

The first scaling (quant.cards.sfact) serves to correct measurement uncertainties to account for the observed dispersion of different measurements. So it has nothing to do with the minimization procedure.

The second scaling instead serves to reduce the dynamic range of the values of the singular value decomposition of the matrix that represents the system of linear equations to be solved.

The dynamic range in R has to be at most 10^8 or 1/sqrt(numeric floating point precision) to permit matrix inversion. I do not rescale the equations based on the inverse of measurements covariance (I am not sure that there might be a way to do that BTW), but I compute the median scale of the related equations and then rescale the equations that involve the constrains and lagrangue multipliers to match that scale.

Note that I use a way to solve the minimization that may be non-optimal. A different path could be to use the linearized constraint equations to get a part of the fit parameters as a function of a minimum set, and then solve a smaller number of linear equations for just that smaller set of fit parameters. It is unclear to me what are actually the tradeoffs, it is a topic rarely discussed, but this second solution requires the inversion of a smaller matrix.

There I just compute the quantities that will not change in the incoming iterative procedure, there constraints are iteratively linearized.

ModelingToolkit appears quite interesting but I had no time to experiment with it. My impression is that it can solve the problem I am solving with my code. Optimizing non-linear equations seems a tough job to be done in a general way, as recognizing redundant constraints. When linearized, however, the check for redundancy (within numerical precision) is easy. But it takes quote some work to then trace and report the offending list of constraints, something still on my todo list.




Thank you for clarifying :smiley:

Not sure how it is in Julia. So far, my prototype does not do any checks of the matrix that is inverted (which it probably should). With “bad” input parameters, however, I get a SingularException.

The above message was directed at you, seems like I did not click reply :roll_eyes:

I managed to get the method to work in JuMP.

Here is my code for the first example of Lars Nielsen’s paper:

using NamedArrays
using JuMP
import Ipopt

function lsq()
    model = Model(Ipopt.Optimizer)
    # measured input quantities
    scale_indications = [ # [div]
    scale_indications_reference = [199.998867, 199.998875] # [div]
    weights_sum                 = 199.988816 # [g]
    weight_reference            = 199.999043 # [g]
    density                     = 7965.76 # [kg/m³]
    density_reference           = 7833.01 # [kg/m³]
    density_air                 = 1.1950  # [kg/m³]

    # model input variables
    @variables(model, begin
            Iₒ[i=1:16] == scale_indications[i]            
            Irₒ[i=1:2] == scale_indications_reference[i]
            msₒ        == weights_sum
            mrₒ        == weight_reference
            ρₒ         == density
            ρrₒ        == density_reference
            aₒ         == density_air
            # ζ parameters (refined from z)
            I[i=1:16], (start = scale_indications[i])
            Ir[i=1:2], (start = scale_indications_reference[i]) 
            ms,        (start = weights_sum)
            mr,        (start = weight_reference)
            ρ,         (start = density)
            ρr,        (start = density_reference)
            a,         (start = density_air)
            # β parameters (unknown)
            f,  (start = 1.0) # scale correction factor
            A,  (start = 1.0) # non-linearity
            m1, (start = 1.0) # mass 1 to 4
            m2, (start = 1.0)
            m3, (start = 1.0)
            m4, (start = 1.0)

    # collect input variables in array
    zs    = [[Iₒ[i] for i in 1:16]..., Irₒ[1], Irₒ[2], msₒ, mrₒ, ρₒ, ρrₒ, aₒ]
    ζs    = [ [I[i] for i in 1:16]..., Ir[1],  Ir[2],  ms,  mr,  ρ,  ρr,  a ]
    # variance of input parameters
    var_z = [5.29e-10*ones(18)..., 1e-10, 6.4e-11, 0.0841, 0.5041, 1.225e-5]
    nz    = length(zs)

    # covariance matrix
    # (named array allows expressive indexing)
    Σ = NamedArray(zeros((nz, nz)), (zs, zs))
    for (zi, var) in zip(zs, var_z)
        Σ[zi, zi] = var
    invΣ = inv(Σ) # fortunately, inversion leaves row and column names alone

    # objective
    # minimize difference of measured and estimated parameters
    # weighted according to (co)variance
    @NLobjective(model, Min, sum(
            (zₗ - ζₗ)*invΣ[zₗ, zₖ]*(zₖ - ζₖ)
            for (zₗ, ζₗ) in zip(zs, ζs)
            for (zₖ, ζₖ) in zip(zs, ζs)
            if invΣ[zₗ, zₖ] != 0
    # constraints
    mass_combinations = [
        m1 + m2 + m3 + m4 # I1
        m1 + m2 + m3 + m4 # I2
        m1 + m2 + m3      # I3
        m1 + m2 + m4      # I4
        m1 + m2           # I5
        m1 + m3 + m4      # I6
        m1 + m4           # I7
        m1 + m3           # I8
        m1                # I9
        m2 + m3 + m4      # I10
        m2 + m3           # I11
        m2 + m4           # I12
        m2                # I13
        m3 + m4           # I14
        m3                # I15
        m4                # I16

    for i in 1:16
        @NLconstraint(model, mass_combinations[i]*(1 - (a - 1.2)*(1/ρ - 1/8000)) - f*(I[i] + A*I[i]^2) == 0)
    @NLconstraint(model, mr*(1 - (a - 1.2)*(1/ρ - 1/8000)) - f*(Ir[1] + A*Ir[1]^2) == 0)
    @NLconstraint(model, mr*(1 - (a - 1.2)*(1/ρ - 1/8000)) - f*(Ir[2] + A*Ir[2]^2) == 0)
    @NLconstraint(model, ms - (m1 + m2 + m3 + m4) == 0)
    # abracadabra!

mod = lsq()

solution_summary(mod; verbose=true)

The summary of the solution is:

* Solver : Ipopt

* Status
  Termination status : LOCALLY_SOLVED
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Result count       : 1
  Has duals          : true
  Message from the solver:

* Candidate solution
  Objective value      : 8.336278266129565
  Primal solution :
    A : -4.29028123993008e-9
    I[10] : 99.98289827591923
    I[11] : 74.9864495677386
    I[12] : 75.00432481389886
    I[13] : 50.00788146321463
    I[14] : 49.974995368628
    I[15] : 24.978557386324223
    I[16] : 24.996432624814283
    I[1] : 199.98861731519125
    I[2] : 199.98861731519125
    I[3] : 174.9921471573928
    I[4] : 175.0100224188919
    I[5] : 150.0135576186036
    I[6] : 149.98067149579722
    I[7] : 125.0020873177321
    I[8] : 124.98421206390323
    I[9] : 100.00563324334139
    Ir[1] : 199.99885452781737
    Ir[2] : 199.99885452781737
    Irₒ[1] : 199.998867
    Irₒ[2] : 199.998875
    Iₒ[10] : 99.982925
    Iₒ[11] : 74.986433
    Iₒ[12] : 75.004325
    Iₒ[13] : 50.007892
    Iₒ[14] : 49.974992
    Iₒ[15] : 24.978533
    Iₒ[16] : 24.996417
    Iₒ[1] : 199.988617
    Iₒ[2] : 199.988608
    Iₒ[3] : 174.992133
    Iₒ[4] : 175.009992
    Iₒ[5] : 150.013558
    Iₒ[6] : 149.980675
    Iₒ[7] : 125.002083
    Iₒ[8] : 124.984217
    Iₒ[9] : 100.00565
    a : 1.195
    aₒ : 1.195
    f : 1.0000018230343597
    m1 : 100.00577238065725
    m2 : 50.00796176585021
    m3 : 24.978600179154746
    m4 : 24.996475446351376
    mr : 199.99904698570728
    mrₒ : 199.999043
    ms : 199.98880977201358
    msₒ : 199.988816
    ρ : 7965.76
    ρr : 7833.01
    ρrₒ : 7833.01
    ρₒ : 7965.76
  Dual solution :

* Work counters
  Solve time (sec)   : 0.00295

The results agree with the paper within the stated uncertainty.

My problem now is that I cannot figure out how to get the hessian at the solution, so I cannot calculate uncertainties and compare those.

Could anyone who is familiar with JuMP help me out here?


I have tried to get Hessians out of JuMP objects and it’s pretty brutal. It might be easiest to just write the objective as a standalone function and then just call ForwardDiff.hessian(obj, mle) with the computed mle.