How to add weights parameter to Generalized Mixed Model

Just copying and pasting @dmbates code in the first answer, I’m also getting a different result in variance, Std.Dev and Std.errors estimates.

using RDatasets, MixedModels

titanic = RDatasets.dataset("datasets", "Titanic");

m = fit(GeneralizedLinearMixedModel, @formula(Survived ~ 1 + Age + Sex + (1|Class)), titanic, Binomial(), wts=float.(titanic.Freq))

m
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  Survived ~ 1 + Age + Sex + (1 | Class)
  Distribution: Binomial{Float64}
  Link: LogitLink()

  Deviance: 2227.9565

Variance components:
         Column    Variance  Std.Dev. 
Class (Intercept)  26.841378 5.1808665

 Number of obs: 32; levels of grouping factors: 4

Fixed-effects parameters:
────────────────────────────────────────────────────
             Estimate  Std.Error    z value  P(>|z|)
────────────────────────────────────────────────────
(Intercept)   1.12409    2.76454   0.406611   0.6843
Age: Child    1.04558    2.03346   0.514188   0.6071
Sex: Male    -2.41575    1.16525  -2.07317    0.0382
────────────────────────────────────────────────────

I’m on MixedModels v2.3.0

I’m reasonably sure that’s the correct answer (because that’s the only one that make sense in terms of other things we know are going on).

That said, there are only 4 levels of class, so we’re already really pushing the limits of what we should be fitting. After all, we know that variance estimates (outside of a mixed models context) based on 4 observations generally won’t be great.

@palday I’ll try that now, I just tested that MixedModels doesn’t load on JuliaBox’s VM’s on
1.0.5

or Julia 1.3.1

This is my quick dirty way of testing reproducibility.

I’m not sure what’s going on there, but MixedModels 2.3.0 does have Julia 1.3 listed in its compatibility requirements, so it’s not surprising that it doesn’t run on earlier versions (in part because 1.3 has some threading and linear algebra enhancements that we take advantage of).

@palday but lme4 says that the results are statistically significant.

I think because when weighted the properly sample size is 2,201
sum(as.data.frame(Titanic)$Freq)

My guess is that the optimum isn’t well defined and that we’re seeing two local optima. I don’t have time at the moment to check the profiles (the code for this isn’t finalized in MixedModels.jl and I – and I suspect @dmbates too – have spent too much time in Julia lately to remember how to produce a profile plot quickly there). Because looking at the random effects – which is where the magic happens – we’re seeing approximately the same two answers again and again. Given that there are only 4 groups, this isn’t too surprising, but I would have to double check.

If I de-tabulate the tabulated Titanic dataset so that each row represents an individual person the results change:

I don’t know why. I understood the weights parameter to be intended for just this purpose.
See:

and:

It produces the exact same [except residuals] results on a GLM but not GLMER.

If I use the de-tabulated data in Julia:
R
write.csv(x = tidyr::uncount(data = data.frame(Titanic), weights = data.frame(Titanic)$Freq)[,-5], file = "C://Users/laut/Desktop/titanic.tsv")
Julia
titanic2 = CSV.File("C:\\Users\\laut\\Desktop\\titanic.tsv"; delim = "\t", header = true) |> DataFrame;
MixedModels.fit(GeneralizedLinearMixedModel, @formula(Survived ~ 1 + Age * Sex + (1 | Class)), titanic2, Bernoulli(), nAGQ = 1)
image

The problem seems to be a difference between the released version of the MixedModels package (v2.3.0) and the master branch on the repository (v3.0.0-DEV). In v2.3.0 the estimated standard deviation of the random effects is over 6 in the model with the interaction term

$ julia-1.4.2 --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.2 (2020-05-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(tmp) pkg> st
Status `/tmp/Project.toml`
  [a93c6f00] DataFrames v0.21.3
  [ff71e718] MixedModels v2.3.0
  [6f49c342] RCall v0.13.7

julia> using DataFrames, MixedModels, RCall

julia> titanic = filter(:Freq => !iszero, rcopy(R"as.data.frame(datasets::Titanic)"));

julia> const GLMM = GeneralizedLinearMixedModel;

julia> form = @formula Survived ~ 1 + Age*Sex + (1|Class);

julia> m = fit(GLMM, form, titanic, Bernoulli(), wts=titanic.Freq, nAGQ=3)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 3)
  Survived ~ 1 + Age + Sex + Age & Sex + (1 | Class)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 2210.0492

Variance components:
         Column    Variance Std.Dev. 
Class (Intercept)  37.752020 6.144267

 Number of obs: 24; levels of grouping factors: 4

Fixed-effects parameters:
──────────────────────────────────────────────────────────────────
                           Estimate  Std.Error    z value  P(>|z|)
──────────────────────────────────────────────────────────────────
(Intercept)                0.423055    4.02214   0.105182   0.9162
Age: Adult                -1.77117     2.73242  -0.648208   0.5169
Sex: Female                0.715531    3.93831   0.181685   0.8558
Age: Adult & Sex: Female   1.89466     4.19524   0.451621   0.6515
──────────────────────────────────────────────────────────────────

whereas in v3.0.0-DEV (which requires Julia-v1.4.0 or later)

$ julia-1.4.2 --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.2 (2020-05-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(tmp) pkg> add MixedModels#master
   Updating git-repo `https://github.com/JuliaStats/MixedModels.jl.git`
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Resolving package versions...
   Updating `/tmp/Project.toml`
  [ff71e718] ↑ MixedModels v2.3.0 β‡’ v3.0.0-DEV #master (https://github.com/JuliaStats/MixedModels.jl.git)
   Updating `/tmp/Manifest.toml`
  [69666777] + Arrow v0.2.4
  [becb17da] + Feather v0.5.6
  [53afe959] + FlatBuffers v0.5.4
  [ff71e718] ↑ MixedModels v2.3.0 β‡’ v3.0.0-DEV #master (https://github.com/JuliaStats/MixedModels.jl.git)

(tmp) pkg> st
Status `/tmp/Project.toml`
  [a93c6f00] DataFrames v0.21.3
  [ff71e718] MixedModels v3.0.0-DEV #master (https://github.com/JuliaStats/MixedModels.jl.git)
  [6f49c342] RCall v0.13.7

julia> using DataFrames, MixedModels, RCall
[ Info: Precompiling MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316]

julia> titanic = filter(:Freq => !iszero, rcopy(R"as.data.frame(datasets::Titanic)"));

julia> const GLMM = GeneralizedLinearMixedModel;

julia> form = @formula Survived ~ 1 + Age*Sex + (1|Class);

julia> m = fit(GLMM, form, titanic, Bernoulli(), wts=titanic.Freq, nAGQ=3)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 3)
  Survived ~ 1 + Age + Sex + Age & Sex + (1 | Class)
  Distribution: Bernoulli{Float64}
  Link: LogitLink()

  Deviance: 2210.0492

Variance components:
         Column    Variance   Std.Dev.  
Class (Intercept)  0.40044704 0.63280885

 Number of obs: 24; levels of grouping factors: 4

Fixed-effects parameters:
────────────────────────────────────────────────────────────────
                           Estimate  Std.Error  z value  P(>|z|)
────────────────────────────────────────────────────────────────
(Intercept)                0.422887   0.414341     1.02   0.3074
Age: Adult                -1.77117    0.281569    -6.29   <1e-9
Sex: Female                0.715344   0.405834     1.76   0.0780
Age: Adult & Sex: Female   1.89475    0.432309     4.38   <1e-4
────────────────────────────────────────────────────────────────

Sorry that this took so long to straighten out. I guess we better get v3.0.0 released soon.

How do you install that beta version from github?
add https://github.com/JuliaStats/MixedModels.jl/blob/master/src/MixedModels.jl

The reason that I used RCall instead of RDatasets to obtain the data set is to ensure that it is a more faithful copy of the R dataset. The factor level orders are the same as those in R. If you create a CSV file then read the CSV, which I believe is what happens in RDatasets, you can’t guarantee you will get the same ordering.

1 Like

One of the big changes between 2.3 and 3.0-DEV is that fast=false fits use a two-stage optimization processes in 2.3, but a single stage in 3.0-DEV. There are a few examples where this leads to very different fits…

You can install the dev version with ]add MixedModels#master

1 Like

The easiest way is to go into the package manager REPL and

add MixedModels#master

as shown in the transcript.
If you do it in a call to Pkg.add you have to remember how to create a PackageSpec to get a particular branch and I always need to look that up.

1 Like

I can confirm that’s the case. I removed MixedModel v2.3.0, added MixedModels#master, recompiled and ran the same code snippet and obtained the same result you showed.