Using GLM.jl for mixed design Anova

question

#1

Is it possible to get with GLM.jl and ANOVA.jl something equivalent to R’s

aov(data ~ age*day + Error(id/day), data = data)

to study data with one between (age) and one within (day) factor?
I am stuck with translating the formula.


#2

I believe Mixed Models are what you seek :slight_smile:


#3

Thanks, I guess you are right. Unfortunately, ANOVA.jl doesn’t seem to support MixedModels…

PS. R’s aov does not seem to fit a mixed model, but my knowledge of R is too limited to understand what is really going on in that code.


#4

Search ANOVA in discourse and there are several threads discussing it. Not sure if any of them have the answers you’re looking for, but it might be worth a look.


#5

Thanks @kevbonham.
I didn’t find anything in discourse, but wrote an ugly solution:

julia> using GLM, ANOVA, Distributions, Statistics, DataFrames
julia> function splitplotanova(df, betweenfactor, withinfactor, id, data)
           dfmean = by(df, [betweenfactor, id], d -> mean(d[data]))
           tmp1 = anova(fit(LinearModel, @eval(@formula(x1 ~ $betweenfactor)), dfmean))
           tmp2 = anova(fit(LinearModel, @eval(@formula($data ~ $withinfactor)), df))
           tmp3 = anova(fit(LinearModel,
                            @eval(@formula($data ~ $betweenfactor * $withinfactor)), df))
           SSt = sum(abs2, df[data] .- mean(df[data]))
           DFwithin = tmp2.DF[1]
           tmp1.SS .*= DFwithin + 1
           tmp1.MSS .*= DFwithin + 1
           DFbetween = tmp1.DF[1]
           SSerror = SSt - sum(tmp1.SS) - tmp2.SS[1] - tmp3.SS[3]
           DFerror = (length(union(df[id])) - DFbetween - 1) * DFwithin
           MSSerror = SSerror/DFerror
           F = [tmp2.MSS[1]/MSSerror, tmp3.MSS[3]/MSSerror, 0]
           DF = [DFwithin, DFbetween*DFwithin, DFerror]
           (between = tmp1,
            within = DataFrame(Source = ["$withinfactor",
                                         "$withinfactor & $betweenfactor",
                                         "Residuals"],
                               DF = DF,
                               SS = [tmp2.SS[1], tmp3.SS[3], SSerror],
                               MSS = [tmp2.MSS[1], tmp3.MSS[3], MSSerror],
                               F = F,
                               p = [ccdf(FDist(df, DF[end]), f) for (f, df) in zip(F, DF)])
           )
       end
splitplotanova (generic function with 1 method)

julia> D = DataFrame(age = [fill("y", 7*5); fill("m", 7*5); fill("o", 7*5)],
                     day = vcat([1:5 for _ in 1:21]...),
                     id = vcat([fill(i, 5) for i in 1:21]...),
                     data = [250, 278, 442, 368, 456,
                             65, 207, 341, 382, 298,
                             251, 261, 384, 421, 342,
                             241, 314, 423, 415, 468,
                             154, 167, 257, 275, 332,
                             103, 286, 401, 291, 367,
                             230, 306, 432, 386, 423,
                             54, 172, 307, 261, 360,
                             20, 116, 425, 398, 268,
                             41, 168, 378, 317, 470,
                             200, 157, 283, 259, 273,
                             34, 86, 351, 280, 320,
                             29, 81, 193, 240, 233,
                             3, 54, 285, 216, 245,
                             118, 124, 365, 311, 331,
                             83, 266, 382, 369, 295,
                             38, 207, 289, 385, 373,
                             71, 211, 356, 380, 305,
                             123, 331, 407, 461, 445,
                             71, 285, 471, 407, 433,
                             108, 247, 317, 307, 324]);

julia> categorical!(D, [:age, :day, :id]);

julia> splitplotanova(D, :age, :day, :id, :data)
(between = 2×6 DataFrame
│ Row │ Source    │ DF        │ SS        │ MSS       │ F         │ p          │
│     │ String    │ Abstract… │ Abstract… │ Abstract… │ Abstract… │ Abstract…  │
├─────┼───────────┼───────────┼───────────┼───────────┼───────────┼────────────┤
│ 1   │ age       │ 2.0       │ 1.78358e5 │ 89179.1   │ 8.30132   │ 0.00278914 │
│ 2   │ Residuals │ 18.0      │ 1.9337e5  │ 10742.8   │ 0.0       │ 0.0        │, within = 3×6 DataFrame
│ Row │ Source    │ DF      │ SS        │ MSS      │ F       │ p           │
│     │ String    │ Float64 │ Float64   │ Float64  │ Float64 │ Float64     │
├─────┼───────────┼─────────┼───────────┼──────────┼─────────┼─────────────┤
│ 1   │ day       │ 4.0     │ 1.0258e6  │ 2.5645e5 │ 114.632 │ 1.91047e-30 │
│ 2   │ day & age │ 8.0     │ 38740.1   │ 4842.51  │ 2.16458 │ 0.0403462   │
│ 3   │ Residuals │ 72.0    │ 1.61075e5 │ 2237.15  │ 0.0     │ 1.0         │)

julia> using RCall

julia> R"summary(aov(data ~ age*day + Error(id/day), data = $D))"
RObject{VecSxp}

Error: id
          Df Sum Sq Mean Sq F value  Pr(>F)   
age        2 178358   89179   8.301 0.00279 **
Residuals 18 193370   10743                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:day
          Df  Sum Sq Mean Sq F value Pr(>F)    
day        4 1025800  256450 114.632 <2e-16 ***
age:day    8   38740    4843   2.165 0.0403 *  
Residuals 72  161075    2237                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1