Using GLM.jl for mixed design Anova

jbrea · January 22, 2019, 10:48am

Is it possible to get with GLM.jl and ANOVA.jl something equivalent to R’s

aov(data ~ age*day + Error(id/day), data = data)

to study data with one between (age) and one within (day) factor?
I am stuck with translating the formula.

DominiqueMakowski · January 22, 2019, 1:13pm

I believe Mixed Models are what you seek

jbrea · January 22, 2019, 5:33pm

Thanks, I guess you are right. Unfortunately, ANOVA.jl doesn’t seem to support MixedModels…

PS. R’s aov does not seem to fit a mixed model, but my knowledge of R is too limited to understand what is really going on in that code.

kevbonham · January 22, 2019, 7:06pm

Search ANOVA in discourse and there are several threads discussing it. Not sure if any of them have the answers you’re looking for, but it might be worth a look.

jbrea · January 22, 2019, 9:58pm

Thanks @kevbonham.
I didn’t find anything in discourse, but wrote an ugly solution:

julia> using GLM, ANOVA, Distributions, Statistics, DataFrames
julia> function splitplotanova(df, betweenfactor, withinfactor, id, data)
           dfmean = by(df, [betweenfactor, id], d -> mean(d[data]))
           tmp1 = anova(fit(LinearModel, @eval(@formula(x1 ~ $betweenfactor)), dfmean))
           tmp2 = anova(fit(LinearModel, @eval(@formula($data ~ $withinfactor)), df))
           tmp3 = anova(fit(LinearModel,
                            @eval(@formula($data ~ $betweenfactor * $withinfactor)), df))
           SSt = sum(abs2, df[data] .- mean(df[data]))
           DFwithin = tmp2.DF[1]
           tmp1.SS .*= DFwithin + 1
           tmp1.MSS .*= DFwithin + 1
           DFbetween = tmp1.DF[1]
           SSerror = SSt - sum(tmp1.SS) - tmp2.SS[1] - tmp3.SS[3]
           DFerror = (length(union(df[id])) - DFbetween - 1) * DFwithin
           MSSerror = SSerror/DFerror
           F = [tmp2.MSS[1]/MSSerror, tmp3.MSS[3]/MSSerror, 0]
           DF = [DFwithin, DFbetween*DFwithin, DFerror]
           (between = tmp1,
            within = DataFrame(Source = ["$withinfactor",
                                         "$withinfactor & $betweenfactor",
                                         "Residuals"],
                               DF = DF,
                               SS = [tmp2.SS[1], tmp3.SS[3], SSerror],
                               MSS = [tmp2.MSS[1], tmp3.MSS[3], MSSerror],
                               F = F,
                               p = [ccdf(FDist(df, DF[end]), f) for (f, df) in zip(F, DF)])
           )
       end
splitplotanova (generic function with 1 method)

julia> D = DataFrame(age = [fill("y", 7*5); fill("m", 7*5); fill("o", 7*5)],
                     day = vcat([1:5 for _ in 1:21]...),
                     id = vcat([fill(i, 5) for i in 1:21]...),
                     data = [250, 278, 442, 368, 456,
                             65, 207, 341, 382, 298,
                             251, 261, 384, 421, 342,
                             241, 314, 423, 415, 468,
                             154, 167, 257, 275, 332,
                             103, 286, 401, 291, 367,
                             230, 306, 432, 386, 423,
                             54, 172, 307, 261, 360,
                             20, 116, 425, 398, 268,
                             41, 168, 378, 317, 470,
                             200, 157, 283, 259, 273,
                             34, 86, 351, 280, 320,
                             29, 81, 193, 240, 233,
                             3, 54, 285, 216, 245,
                             118, 124, 365, 311, 331,
                             83, 266, 382, 369, 295,
                             38, 207, 289, 385, 373,
                             71, 211, 356, 380, 305,
                             123, 331, 407, 461, 445,
                             71, 285, 471, 407, 433,
                             108, 247, 317, 307, 324]);

julia> categorical!(D, [:age, :day, :id]);

julia> splitplotanova(D, :age, :day, :id, :data)
(between = 2×6 DataFrame
│ Row │ Source    │ DF        │ SS        │ MSS       │ F         │ p          │
│     │ String    │ Abstract… │ Abstract… │ Abstract… │ Abstract… │ Abstract…  │
├─────┼───────────┼───────────┼───────────┼───────────┼───────────┼────────────┤
│ 1   │ age       │ 2.0       │ 1.78358e5 │ 89179.1   │ 8.30132   │ 0.00278914 │
│ 2   │ Residuals │ 18.0      │ 1.9337e5  │ 10742.8   │ 0.0       │ 0.0        │, within = 3×6 DataFrame
│ Row │ Source    │ DF      │ SS        │ MSS      │ F       │ p           │
│     │ String    │ Float64 │ Float64   │ Float64  │ Float64 │ Float64     │
├─────┼───────────┼─────────┼───────────┼──────────┼─────────┼─────────────┤
│ 1   │ day       │ 4.0     │ 1.0258e6  │ 2.5645e5 │ 114.632 │ 1.91047e-30 │
│ 2   │ day & age │ 8.0     │ 38740.1   │ 4842.51  │ 2.16458 │ 0.0403462   │
│ 3   │ Residuals │ 72.0    │ 1.61075e5 │ 2237.15  │ 0.0     │ 1.0         │)

julia> using RCall

julia> R"summary(aov(data ~ age*day + Error(id/day), data = $D))"
RObject{VecSxp}

Error: id
          Df Sum Sq Mean Sq F value  Pr(>F)   
age        2 178358   89179   8.301 0.00279 **
Residuals 18 193370   10743                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:day
          Df  Sum Sq Mean Sq F value Pr(>F)    
day        4 1025800  256450 114.632 <2e-16 ***
age:day    8   38740    4843   2.165 0.0403 *  
Residuals 72  161075    2237                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Topic		Replies	Views
Poll: Do we Julians want ANOVAs? Statistics	14	3628	December 18, 2018
Equivalent of Rs `aov(Y ~ B + Error(A/B), data=d)` General Usage question , statistics	1	611	March 7, 2021
Didn't get consistent result of ANOVA from R and Julia General Usage question , package	2	546	July 4, 2019
Repeated measures ANOVA Statistics	11	1076	May 8, 2021
ANOVA Tests in Julia? Statistics	76	14020	August 11, 2022

Using GLM.jl for mixed design Anova

Related topics