Using GLM.jl for mixed design Anova

Is it possible to get with GLM.jl and ANOVA.jl something equivalent to R’s

aov(data ~ age*day + Error(id/day), data = data)

to study data with one between (age) and one within (day) factor?
I am stuck with translating the formula.

I believe Mixed Models are what you seek :slight_smile:

Thanks, I guess you are right. Unfortunately, ANOVA.jl doesn’t seem to support MixedModels…

PS. R’s aov does not seem to fit a mixed model, but my knowledge of R is too limited to understand what is really going on in that code.

Search ANOVA in discourse and there are several threads discussing it. Not sure if any of them have the answers you’re looking for, but it might be worth a look.

Thanks @kevbonham.
I didn’t find anything in discourse, but wrote an ugly solution:

julia> using GLM, ANOVA, Distributions, Statistics, DataFrames
julia> function splitplotanova(df, betweenfactor, withinfactor, id, data)
           dfmean = by(df, [betweenfactor, id], d -> mean(d[data]))
           tmp1 = anova(fit(LinearModel, @eval(@formula(x1 ~ $betweenfactor)), dfmean))
           tmp2 = anova(fit(LinearModel, @eval(@formula($data ~ $withinfactor)), df))
           tmp3 = anova(fit(LinearModel,
                            @eval(@formula($data ~ $betweenfactor * $withinfactor)), df))
           SSt = sum(abs2, df[data] .- mean(df[data]))
           DFwithin = tmp2.DF[1]
           tmp1.SS .*= DFwithin + 1
           tmp1.MSS .*= DFwithin + 1
           DFbetween = tmp1.DF[1]
           SSerror = SSt - sum(tmp1.SS) - tmp2.SS[1] - tmp3.SS[3]
           DFerror = (length(union(df[id])) - DFbetween - 1) * DFwithin
           MSSerror = SSerror/DFerror
           F = [tmp2.MSS[1]/MSSerror, tmp3.MSS[3]/MSSerror, 0]
           DF = [DFwithin, DFbetween*DFwithin, DFerror]
           (between = tmp1,
            within = DataFrame(Source = ["$withinfactor",
                                         "$withinfactor & $betweenfactor",
                                         "Residuals"],
                               DF = DF,
                               SS = [tmp2.SS[1], tmp3.SS[3], SSerror],
                               MSS = [tmp2.MSS[1], tmp3.MSS[3], MSSerror],
                               F = F,
                               p = [ccdf(FDist(df, DF[end]), f) for (f, df) in zip(F, DF)])
           )
       end
splitplotanova (generic function with 1 method)

julia> D = DataFrame(age = [fill("y", 7*5); fill("m", 7*5); fill("o", 7*5)],
                     day = vcat([1:5 for _ in 1:21]...),
                     id = vcat([fill(i, 5) for i in 1:21]...),
                     data = [250, 278, 442, 368, 456,
                             65, 207, 341, 382, 298,
                             251, 261, 384, 421, 342,
                             241, 314, 423, 415, 468,
                             154, 167, 257, 275, 332,
                             103, 286, 401, 291, 367,
                             230, 306, 432, 386, 423,
                             54, 172, 307, 261, 360,
                             20, 116, 425, 398, 268,
                             41, 168, 378, 317, 470,
                             200, 157, 283, 259, 273,
                             34, 86, 351, 280, 320,
                             29, 81, 193, 240, 233,
                             3, 54, 285, 216, 245,
                             118, 124, 365, 311, 331,
                             83, 266, 382, 369, 295,
                             38, 207, 289, 385, 373,
                             71, 211, 356, 380, 305,
                             123, 331, 407, 461, 445,
                             71, 285, 471, 407, 433,
                             108, 247, 317, 307, 324]);

julia> categorical!(D, [:age, :day, :id]);

julia> splitplotanova(D, :age, :day, :id, :data)
(between = 2×6 DataFrame
│ Row │ Source    │ DF        │ SS        │ MSS       │ F         │ p          │
│     │ String    │ Abstract… │ Abstract… │ Abstract… │ Abstract… │ Abstract…  │
├─────┼───────────┼───────────┼───────────┼───────────┼───────────┼────────────┤
│ 1   │ age       │ 2.0       │ 1.78358e5 │ 89179.1   │ 8.30132   │ 0.00278914 │
│ 2   │ Residuals │ 18.0      │ 1.9337e5  │ 10742.8   │ 0.0       │ 0.0        │, within = 3×6 DataFrame
│ Row │ Source    │ DF      │ SS        │ MSS      │ F       │ p           │
│     │ String    │ Float64 │ Float64   │ Float64  │ Float64 │ Float64     │
├─────┼───────────┼─────────┼───────────┼──────────┼─────────┼─────────────┤
│ 1   │ day       │ 4.0     │ 1.0258e6  │ 2.5645e5 │ 114.632 │ 1.91047e-30 │
│ 2   │ day & age │ 8.0     │ 38740.1   │ 4842.51  │ 2.16458 │ 0.0403462   │
│ 3   │ Residuals │ 72.0    │ 1.61075e5 │ 2237.15  │ 0.0     │ 1.0         │)

julia> using RCall

julia> R"summary(aov(data ~ age*day + Error(id/day), data = $D))"
RObject{VecSxp}

Error: id
          Df Sum Sq Mean Sq F value  Pr(>F)   
age        2 178358   89179   8.301 0.00279 **
Residuals 18 193370   10743                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:day
          Df  Sum Sq Mean Sq F value Pr(>F)    
day        4 1025800  256450 114.632 <2e-16 ***
age:day    8   38740    4843   2.165 0.0403 *  
Residuals 72  161075    2237                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1