How to add weights parameter to Generalized Mixed Model

How do you add the weight of an observation to a Mixed Model? I thought I could add the Freq column to wt argument, but apparently not.

using RDatasets MixedModels
titanic = RDatasets.dataset("datasets", "Titanic")
titanic.surv_flg = titanic.Survived .== "Yes";

This runs:

MixedModels.fit(GeneralizedLinearMixedModel, @formula(surv_flg ~ 1 + Age + Sex + (1 | Class)), titanic, Bernoulli(), nAGQ = 2, fast = true)

But this doesn’t

MixedModels.fit(GeneralizedLinearMixedModel, @formula(surv_flg ~ 1 + Age * Sex + (1 | Class)), titanic, wt = Freq, Bernoulli(), nAGQ = 2, fast = true)

The short answer is that the argument name is wts, not wt. Using MixedModels#master under Julia-1.4.2 I get

julia> using RDatasets, MixedModels
[ Info: Precompiling RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b]
[ Info: Precompiling MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316]

julia> titanic = RDatasets.dataset("datasets", "Titanic");

julia> m = fit(GeneralizedLinearMixedModel, @formula(Survived ~ 1 + Age + Sex + (1|Class)), titanic, Binomial(), wts=float.(titanic.Freq))
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  Survived ~ 1 + Age + Sex + (1 | Class)
  Distribution: Binomial{Float64}
  Link: LogitLink()

  Deviance: 2227.9565

Variance components:
         Column    Variance  Std.Dev. 
Class (Intercept)  0.38409626 0.619755

 Number of obs: 32; levels of grouping factors: 4

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   1.1241    0.330702     3.40   0.0007
Age: Child    1.04557   0.243236     4.30   <1e-4
Sex: Male    -2.41576   0.139383   -17.33   <1e-66
──────────────────────────────────────────────────

Looks like docs need to be updated:

MixedModels.GeneralizedLinearMixedModel β€” Type.

GeneralizedLinearMixedModel

Generalized linear mixed-effects model representation

Fields

  • LMM : a LinearMixedModel - the local approximation to the GLMM.
  • Ξ² : the pivoted and possibly truncated fixed-effects vector
  • Ξ²β‚€ : similar to Ξ² . Used in the PIRLS algorithm if step-halving is needed.
  • ΞΈ : covariance parameter vector
  • b : similar to u , equivalent to broadcast!(*, b, LMM.Ξ›, u)
  • u : a vector of matrices of random effects
  • uβ‚€ : similar to u . Used in the PIRLS algorithm if step-halving is needed.
  • resp : a GlmResp object
  • Ξ· : the linear predictor
  • wt : vector of prior case weights, a value of T[] indicates equal weights.

Notice that the name of the wts column must be fully qualified and it should be converted with float..

I was surprised that this worked because of the rows with zero frequency. The typical way of coding contingency-table data like this for fitting a GLM or GLMM with a Binomial distribution is to combine the No/Yes responses into a single row where the response is the proportion of Yes and the weights are the total of Yes and No. I think that a clever DataFrames.group_by call, perhaps with select or transform should achieve that.

1 Like

You are right, that should be changed, but it is more a matter of inconsistency than incorrect documentation. The field is named wt but the argument in the fit method is named wts.

My favorite Oscar Wilde quote is β€œConsistency is the last refuge of the unimaginative”.

I was looking for something similar to R’s weights
glmer(formula = Survived ~ 1 + Age + Sex + (1|Class), family = binomial, weights = Freq, nAGQ = 2, data = as.data.frame(Titanic))

Isn’t that what I wrote?

julia> const GLMM = GeneralizedLinearMixedModel;

julia> form = @formula(Survived ~ 1 + Age + Sex + (1|Class));

julia> m = fit(GLMM, form, titanic, Bernoulli(), wts=float.(titanic.Freq), nAGQ=3)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 3)
  Survived ~ 1 + Age + Sex + (1 | Class)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 2227.9546

Variance components:
         Column    Variance  Std.Dev. 
Class (Intercept)  0.3841519 0.6197999

 Number of obs: 32; levels of grouping factors: 4

Fixed-effects parameters:
──────────────────────────────────────────────────
             Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────
(Intercept)   1.12414   0.330723     3.40   0.0007
Age: Child    1.04558   0.243237     4.30   <1e-4
Sex: Male    -2.41577   0.139384   -17.33   <1e-66
──────────────────────────────────────────────────

I use nAGQ=3 instead of 2 because you always evaluate the midpoint so you might as well use the next odd number.

1 Like

I’m not sure. The results seem quite different.

R:
summary(glmer(formula = Survived ~ 1 + Age * Sex + (1|Class), family = binomial, weights = Freq, nAGQ = 2, data = as.data.frame(Titanic)))

Julia
MixedModels.fit(GeneralizedLinearMixedModel, @formula(surv_flg ~ 1 + Age + Sex + Age & Sex + (1 | Class)), titanic[titanic.Freq .> 0,:], wts = titanic[titanic.Freq .> 0,"Freq"], Bernoulli(), nAGQ = 3)

image

julia> form = @formula(Survived ~ 1 + Age*Sex + (1|Class));

julia> nonzero = filter(:Freq => !iszero, titanic);

julia> m = fit(GLMM, form, nonzero, Bernoulli(), wts=nonzero.Freq, nAGQ=3)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 3)
  Survived ~ 1 + Age + Sex + Age & Sex + (1 | Class)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 2210.0492

Variance components:
         Column    Variance   Std.Dev.  
Class (Intercept)  0.40040641 0.63277675

 Number of obs: 24; levels of grouping factors: 4

Fixed-effects parameters:
──────────────────────────────────────────────────────────────
                         Estimate  Std.Error  z value  P(>|z|)
──────────────────────────────────────────────────────────────
(Intercept)              1.26181    0.339321     3.72   0.0002
Age: Child              -0.123581   0.335105    -0.37   0.7123
Sex: Male               -2.61007    0.14972    -17.43   <1e-67
Age: Child & Sex: Male   1.89475    0.432308     4.38   <1e-4
──────────────────────────────────────────────────────────────

To get the same answers as the call to glmer you would need to change the order of the levels on Age and Sex.

1 Like

#facepalm

Thanks for your help/patience.

The /beta coefficients line up now.
Followup: Any idea why the std.error, z-value, and p-values don’t match?

as.data.frame(Titanic) %>% 
    mutate(Age = relevel(x = Age, ref = "Adult"),
           Sex = relevel(x = Sex, ref = "Female")) %>%
    glmer(formula = Survived ~ 1 + Age * Sex + (1|Class), family = binomial, weights = Freq, nAGQ = 3, data = .) %>% 
    summary()

The standard errors, z-values and p-values match more-or-less with the results I posted. Keep in mind that these are even more approximate for a GLMM than for a LMM so β€œmatching results” should be done with a rather wide tolerance.

@dmbates
I must be really dense today or something is off in my install (Win10-64).

Using Julia 1.4.2
image

I updated MixedModels
image

and ran your initial example code:
image

which for me yields
image

and not your results
image

I’ve gotten the same results with both the Julia CL and Atom IDE.

Do you have any suggestions what the problem might be?

I was using MixedModels#master as in

(@v1.4) pkg> add MixedModels#master

It seems that the coefficients are properly estimated but the standard deviation of the random effects and the standard errors are incorrect. @palday Do you recall if there was an issue related to weights in the released version v2.3.0? You have been doing much more on GLMMs while I have been down the automatic differentiation rabbit hole.

I was unable to reproduce the incorrect fit with MixedModels 2.3.0 – neither with nAGQ=1 nor nAGQ=3.

I tried to rule out the user hardware error by using a JuliaBox but ran into issues with adding the package :disappointed_relieved:

The standard error issue that I’m aware of never made it into a release – there have been some intermittent issues with weights but I don’t think they’re related.

One thing I saw when skimming the thread is that there was one fit at the top with nAGQ > 1 and fast=true. This corresponds to a configuration that simply isn’t possible in lme4, namely AGQ but while optimizing on only the random (and not the fixed) effects. In lme4, you get the Laplace approximation (nAGQ=1) when you optimize on only the RE (confusingly with the option nAGQ=0).

A cliche’d suggestion: can you run locally, but after restarting julia? So restart julia, do the updates, then do the fitting. Don’t update while a package is loaded.

I’ve been working on a RCall compatibility for MixedModels <=> lme4 in my package JellyMe4 (it’s on the Julia registry). The latest release won’t move glmer to GeneralizedMixedModel but should be able to move Bernoulli models in the other direction (and will preserve contrasts). The funny thing there is that I’m able to reproduce @laut 's fit that way:

julia> using RCall, JellyMe4
[ Info: Precompiling RCall [6f49c342-dc21-5d91-9882-a32aef131414]
[ Info: Precompiling JellyMe4 [19ac8677-9a15-4623-9afd-84acc6165ce7]

julia> model = (m, nonzero);
julia> @rput model;
[ Info: contrasts on interaction terms are assumed to decompose into products of contrasts on the first-order terms
[ Info: (if you don't know what that means, you're probably fine)
β”Œ Warning: Some accuracy is lost in translation for GLMMs. This is fine with plotting, but proceed with caution for inferences.
β”‚ You can try letting the model "reconverge" in lme4 with
β”‚     update(model, control=glmerControl(optimizer="nloptwrap", calc.derivs=FALSE)).
β”” @ JellyMe4 ~/.julia/packages/JellyMe4/Hupt7/src/glmerMod.jl:144
[ Info: lme4 handles deviance for GLMMs differently than MixedModels.jl
[ Info: for the correct comparison, examine -2*logLik(RModel) and deviance(JuliaModel)

julia> R"summary(model)"
RObject{VecSxp}
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Survived ~ 1 + Age + Sex + Age:Sex + (1 | Class)
   Data: jellyme4_data
Control: glmerControl(optimizer = "nloptwrap", optCtrl = list(maxeval = 1),  
    calc.derivs = FALSE)

     AIC      BIC   logLik deviance df.resid 
    49.0     54.9    -19.5     39.0       19 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0631 -0.7030  0.5001  0.7373  2.1385 

Random effects:
 Groups Name        Variance Std.Dev.
 Class  (Intercept) 0.4006   0.6329  
Number of obs: 24, groups:  Class, 4

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)  
(Intercept)        1.2618     0.9218   1.369   0.1710  
AgeChild          -0.1235     1.4687  -0.084   0.9330  
SexMale           -2.6101     1.2214  -2.137   0.0326 *
AgeChild:SexMale   1.8948     1.9781   0.958   0.3381  
---
Signif. codes:  0 β€˜***’ 0.001 β€˜**’ 0.01 β€˜*’ 0.05 β€˜.’ 0.1 β€˜ ’ 1

Correlation of Fixed Effects:
            (Intr) AgChld SexMal
AgeChild    -0.552              
SexMale     -0.666  0.416       
AgChld:SxMl  0.411 -0.728 -0.617

julia> m
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  Survived ~ 1 + Age + Sex + Age & Sex + (1 | Class)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 2210.0511

Variance components:
         Column   Variance Std.Dev.
Class (Intercept)  37.72355 6.14195

 Number of obs: 24; levels of grouping factors: 4

Fixed-effects parameters:
─────────────────────────────────────────────────────────────────
                         Estimate  Std.Error     z value  P(>|z|)
─────────────────────────────────────────────────────────────────
(Intercept)              1.26179     3.29347   0.383119    0.7016
Age: Child              -0.123528    3.25191  -0.0379863   0.9697
Sex: Male               -2.61011     1.45291  -1.79647     0.0724
Age: Child & Sex: Male   1.89476     4.19518   0.451653    0.6515
─────────────────────────────────────────────────────────────────

One thing to note is that weights don’t work in the current JellyMe4 release, so those are getting lost in translation. This suggests that something is happening with the weights in your bad run…