Changing the binding of a variable in a closure

I am labeling this question as “statistics” because the motivation comes from a statistical application. The tl: dr version is that I want to change the binding of a variable in a closure within a loop. The function itself is defined outside the loop. I don’t have a lot of flexibility in defining the function because I am passing it to derivative evaluation functions in the ForwardDiff package and those must be unary functions. Right now my approach is to declare the variable’s name as global before I define the function. I’m wondering if there is a more aesthetically pleasing approach.

So the background is that a nonlinear regression model function depends upon parameters and observed data. Generally the parameters are a relatively short vector and the data are columns in a table, like a DataFrame. One of the simple (but real) examples in Bates and Watts (Wiley, 1988), Nonlinear Regression Analysis and Its Applications is data from an enzyme kinetics experiment

julia> data = DataFrame(conc = repeat([0.02,0.06,0.11,0.22,0.56,1.1], inner=2),
           rate = float([76,47,97,107,123,139,159,152,191,201,207,200]))
12×2 DataFrame
│ Row │ conc    │ rate    │
│     │ Float64 │ Float64 │
├─────┼─────────┼─────────┤
│ 1   │ 0.02    │ 76.0    │
│ 2   │ 0.02    │ 47.0    │
│ 3   │ 0.06    │ 97.0    │
│ 4   │ 0.06    │ 107.0   │
│ 5   │ 0.11    │ 123.0   │
│ 6   │ 0.11    │ 139.0   │
│ 7   │ 0.22    │ 159.0   │
│ 8   │ 0.22    │ 152.0   │
│ 9   │ 0.56    │ 191.0   │
│ 10  │ 0.56    │ 201.0   │
│ 11  │ 1.1     │ 207.0   │
│ 12  │ 1.1     │ 200.0   │

for which the Michaelis-Menten model for reaction rate as a function of substrate concentration would be appropriate.

julia> function MicMen(pars, data)
           Vm = pars[1]
           K = pars[2]
           conc = data.conc
           @. Vm * conc / (K + conc)
       end
MicMen (generic function with 1 method)

For these data, values Vm = 200. and K = 0.05 give the predicted response

julia> show(MicMen([200., 0.05], data))
[57.14285714285714, 57.14285714285714, 109.0909090909091, 109.0909090909091, 137.5, 137.5, 162.96296296296296, 162.96296296296296, 183.60655737704917, 183.60655737704917, 191.30434782608697, 191.30434782608697]

I want to use methods from ForwardDiff to evaluate the Jacobian matrix but the function argument to ForwardDiff.jacobian! must be a function of pars only. I can use a closure and generate

f(x) = MicMen(x, data)

in a context where data is defined so that is fine.

However, I also want to do this for several related groups of responses, say saved in a GroupedDataFrame, where the value of pars will change between groups. This is for fitting nonlinear mixed-effects models, described in Pinheiro and Bates (Springer, 2000), Mixed-Effects Models in S and S-PLUS. As I loop over the groups I need the binding of data in the closure to change to each SubDataFrame. I am currently doing this by assigning a global variable _sdf before defining f and changing the global variable’s binding in the loop.

Are there better approaches? I feel that I should be able to do something with let but I can’t seem to work out how to do so.

Put the variable in some sort of container (like a Ref) and update that instead?

2 Likes

When in doubt, forget that closures exist and do it by hand. E.g.

julia> struct F{T}
       conc::T
       end

julia> function (f::F)(par)
       VM=pars[1]
       K=pars(2)
       conc = f.conc
       @. Vm * conc / (K + conc)
       end

Now, each instance of F, like e.g. F(data.conc), is a valid input to forward diff. This is how closures are internally implemented.

Changing bindings in closures is supported, but almost always a bad idea. Instead, construct a new closure.

edit: Or do what @kristoffer.carlsson said, depending on what you actually want (I don’t know whether I understood what your issue is).

2 Likes

In case someone reading this later needs more detail, I created the Ref from the first SubDataFrame, then define the function for ForwardDiff.jacobian! dereferencing the Ref. If data is the GroupedDataFrame it looks like

Rdf = Ref(first(data))
f(pars) = MicMen(pars, Rdf[])

Then the loop looks like

for (res, sdf) in zip(results, data)
    Rdf[] = sdf
    ... manipulate the pars to get the appropriate value
    ForwardDiff.jacobian!(res, f, pars, cfg)
end

where cfg is a (common) JacobianConfig and results is a preallocated vector of the form

results = [DiffResults.JacobianResult(sdf.age, pars) for sdf in data]

Is it the same pars for all subdataframes? It is unclear what the role of pars is, can you provide an MWE for the complete calculation?

The values of pars change from group to group. The groups of observations correspond to experimental units of some sort in the experiment. In a pharmacokinetics study there will be several participants each of whom receives dosages of a drug following which several measurements of serum concentrations over time are made.

tl; dr As an example, in R, datasets::Theoph is data from such a study. A compartment model for concentration as a function of time depends upon three scalar parameters for each subject: ka, the absorption rate constant, k, the elimination rate constant and V, the effective volume of distribution, as

(dose/V) * (ka /(ka - k)) * (exp(-k*t) - exp(-ka*t))

Typically these parameters, all of which must be positive, are estimated on the logarithmic scale so the model function is

function sdOral1C(φ, d)
    k  = exp(φ[1])    # elimination rate constant from lk
    ka = exp(φ[2])    # absorption rate constant from lka
    V  = exp(φ[3])    # volume of distribution from lV
    t  = d.Time
    @. (d.Dose/V) * (ka/(ka - k)) * (exp(-k*t) - exp(-ka*t))
end

If we pool all the data we get

m = fit!(NLregModel(sdOral1C, Theoph, (lk = -2.5, lka = 0.5, lV = -1.0), :Conc))
Nonlinear regression model (NLregModel)

 Residual sum of squares: 274.4491345682938

──────────────────────────────────
      Estimate  Std.Error  t value
──────────────────────────────────
lk   -2.52424   0.110347    -22.88
lka   0.399228  0.117537      3.40
lV   -0.724023  0.0485799   -14.90
──────────────────────────────────

If we fit each subject’s data separately we get

julia> fits = [fit!(NLregModel(sdOral1C, sdf, params(m), :Conc)) for sdf in groupby(Theoph, :Subject)]
12-element Array{NLregModel{3,Float64},1}:
 Nonlinear regression model (NLregModel)

 Residual sum of squares: 2.44424021741903

──────────────────────────────────
      Estimate  Std.Error  t value
──────────────────────────────────
lk   -2.30733   0.196063    -11.77
lka   0.151628  0.213966      0.71
lV   -0.665909  0.0980873    -6.79
──────────────────────────────────
   
 Nonlinear regression model (NLregModel)

 Residual sum of squares: 0.9965571862947799
...
 Nonlinear regression model (NLregModel)

 Residual sum of squares: 13.463469674674933

──────────────────────────────────
      Estimate  Std.Error  t value
──────────────────────────────────
lk   -2.42548    0.277133    -8.75
lka   0.386278   0.296296     1.30
lV   -0.707117   0.127841    -5.53
──────────────────────────────────

A mixed-effects model for these data is intermediate to the pooled model and the individual model. Each subject’s parameters are a sum of population values (called “fixed effects”) and subject-specific deviations (or “random effects”) which are assumed to be samples from a multivariate distribution (typically multivariate Gaussian). The purpose is to estimate the fixed effects and the variance-covariance parameters in the distribution of the random effects. This distribution regularizes the optimization of the individual’s parameter values. Instead of a nonlinear least squares fit, the optimization of each individual’s parameters becomes a penalized nonlinear least squares fit where the penalty depends on those variance-covariance parameters.

NLregModel is defined in https://github.com/dmbates/NLreg.jl which is also where I am working on an implementation of a nonlinear mixed-effects model.