Repeated measures ANOVA

rikh · May 8, 2021, 8:44am

I’m trying to do a repeated measures ANOVA. Lets say the data is as follows

julia> using CategoricalArrays, DataFrames, Random

julia> Random.seed!(1)

julia> df = DataFrame(
          id = categorical(repeat(1:2; inner=6)),
          a = categorical(repeat(1:3, inner=2, outer=2)),
          b = categorical(repeat(1:2, 6)),
          value = round.(rand(Float64, 12), digits=1)
       ) 
12×4 DataFrame
 Row │ id    a     b     value
     │ Cat…  Cat…  Cat…  Float64
─────┼───────────────────────────
   1 │ 1     1     1         0.2
   2 │ 1     1     2         0.3
   3 │ 1     2     1         0.3
   4 │ 1     2     2         0.0
   5 │ 1     3     1         0.5
   6 │ 1     3     2         0.2
   7 │ 2     1     1         1.0
   8 │ 2     1     2         1.0
   9 │ 2     2     1         0.3
  10 │ 2     2     2         1.0
  11 │ 2     3     1         0.6
  12 │ 2     3     2         0.4

and lets say I want to have the p-values for different effects such as a, b and a*b. I’ve been looking at the packages mentioned in ANOVA Tests in Julia? - #61 by BioTurboNick, but wasn’t able to reproduce work that I’m trying to reproduce.

For the sake of generality, I would prefer solving this problem by just using GLM.jl, which should be possible according to Repeated Measures and Mixed Models and Common statistical tests are linear models (or: how to teach stats). Also, there is a mention of ANOVA in the GLM documentation for ftest.

The main problem that I have is how to deal with the repeated measures parts in combination with the different effects.

EDIT: I think that the solution is to use the MixedModels package (Repeated measures 2 way ANOVA - #2 by mkborregaard).

EDIT2: No, MixedModels is not it because I need F-values and GLM.ftest doesn’t work on MixedModels.

BioTurboNick · May 8, 2021, 11:29am

Hi, can you say more about the trouble you were having with my package?

rikh · May 8, 2021, 11:44am

Hi Nicholas, yeah, it’s actually very easy to reproduce on the example I posted above. Continuing on the previous code block:

julia> using SimpleANOVA

julia> anova(df, :value, [:a, :b])
ERROR: MethodError: no method matching *(::CategoricalValue{Int64, UInt32}, ::Int64)
Closest candidates are:
[...]

Currently, I’m looking into your package again, because I realized that it’s probably more suitable to reproduce SPSS results that I got.

@BioTurboNick, how would you use your package to get results for the problem I posed in this thread?

BioTurboNick · May 8, 2021, 12:13pm

rikh:

df = DataFrame(
          id = categorical(repeat(1:2; inner=6)),
          a = categorical(repeat(1:3, inner=2, outer=2)),
          b = categorical(repeat(1:2, 6)),
          value = round.(rand(Float64, 12), digits=1)
       )

Ah, it’s possible that I just need to generalize something to work with CategoricalArrays. I’ve also found a bug in the df/vector entry point though. I’ll fix the latter shortly.

But it should be as simple as this:

anova(df, :value, [:b, :id, :a], [fixed, subject])

for the case where id is specifying your subjects, a is an among-subjects fixed factor, and b is a within-subjects fixed factor. (Assumes the rightmost factors are fixed if unspecified)

Realized my docstring doesn’t show how to work with DataFrames, so I’ll fix that too.

rikh · May 8, 2021, 1:39pm

Thank you for your example. It is helpful, because I didn’t realize how to pass the [fixed, subject] yet.

Is that the following error?

julia> df = DataFrame(
          id = repeat(1:2; inner=6),
          a = repeat(1:3, inner=2, outer=2),
          b = repeat(1:2, 6),
          value = round.(rand(Float64, 12), digits=1)
       )

julia> anova(df, :value, [:b, :id, :a], [fixed, subject])
ERROR: factornames must have an entry for each factor.
[...]

rikh · May 8, 2021, 1:59pm

And, unfortunately, when I do the same as above on the real data that I have (and cannot share), I get

InexactError: Int64(1.5)
Int64(::Float64)@float.jl:72
var"#anova#65"(::Vector{String}, ::typeof(SimpleANOVA.anova), ::Vector{Float64}, ::Vector{Vector{Int64}}, ::Vector{SimpleANOVA.FactorType})@anova.jl:86
var"#anova#156"(::Vector{String}, ::typeof(SimpleANOVA.anova), ::DataFrames.DataFrame, ::Symbol, ::Vector{Symbol}, ::Vector{SimpleANOVA.FactorType})@anova_dataframes.jl:8
anova@anova_dataframes.jl:4[inlined]

So, that is this line. Maybe, the Int conversion is an assumption that one cannot make here, but I’m not sure and a bit at a loss here.

BioTurboNick · May 8, 2021, 2:03pm

Already fixed, I think, just making sure I didn’t break anything before submitting to the registry

rikh · May 8, 2021, 2:18pm

Yes, nice. The factornames must have an entry for each factor is now fixed on master.

EDIT: The Int conversion error still occurs.

BioTurboNick · May 8, 2021, 5:06pm

The Int error is occurring because something isn’t adding up in the input data. It’s expecting a balanced design with equal numbers of all factor levels. I improved the check.

rikh · May 8, 2021, 5:19pm

Thanks a lot for looking into it! Now, I have a nice “Design is unbalanced.” error instead of the Int error, so I guess that is better indeed.

Anyway, I’m done with the ANOVA problems for today. I hope that I’ll send in a nice Documenter.jl PR for SimpleANOVA tonight so that I have been productive today (it’s 19:18 in the Netherlands now). Otherwise, I’ll send it tomorrow.

I think it’s gonna be nice, but of course whether you want to merge it is gonna be up to you

BioTurboNick · May 8, 2021, 5:20pm

That’d be great, thanks!

jbrea · May 8, 2021, 7:36pm

I implemented R’s aov function in julia some time ago. It can handle repeated measures, also when the data is unbalanced. So far the interface is ugly and I didn’t publish it, but I can do so if there is some interest.

Here is my source code, if you want to give it a try.

aov code

using StatsModels, LinearAlgebra, Distributions

toterm(s1::Term, s2::Term) = InteractionTerm((s1, s2))
toterm(s1::InteractionTerm, s2::Term) = InteractionTerm((s1.terms..., s2))
toterm(s1::Term, s2::InteractionTerm) = InteractionTerm((s1, s2.terms...))
toterm(x::Tuple, s2::Term) = map(t -> toterm(t, s2), x)
toterm(s1::Term, x::Tuple) = map(t -> toterm(s1, t), x)
function predictor_product(predictors)
    init, rest = Iterators.peel(predictors)
    reduce((x, y) -> x + y + toterm(x, y), term.(rest), init = term(init))
end
function errorterm(ef, nested)
    terms = term(1) + term(ef)
    if length(nested) > 0
        nestedterms = predictor_product(nested)
        terms += toterm(term(ef), nestedterms)
    end
    terms
end

name(t::AbstractTerm) = "$(t.sym)"
name(t::InteractionTerm) = join(name.(t.terms), " & ")
name(t::InterceptTerm) = "Intercept"
pivoted_asgn(assign, qrp) = assign[qrp.p[1:rank(qrp.R)]]
function mqr(f::FormulaTerm, data; contrasts = Dict{Symbol, Any}())
    mf = ModelFrame(f, data; contrasts)
    mm = ModelMatrix(mf)
    qrp = qr(mm.m, Val(true))
    (qrp = qrp, mf = mf, mm = mm)
end
function aov(f::FormulaTerm, data; return_locals = false)
    qrp, mf, mm = mqr(f, data)
    aov(qrp, pivoted_asgn(mm.assign, qrp), mf.f.rhs.terms,
        getproperty(data, f.lhs.sym); return_locals)
end
aov(l, y) = aov(l..., y)
function aov(qrp, asgn, terms, y; return_locals = false)
    effects = qrp.Q' * y
    r = rank(qrp.R)
    SS = Float64[]
    names = String[]
    DF = Int[]
    for i in union(asgn)
        n = name(terms[i])
        n == "Intercept" && continue
        idxs = findall(asgn .== i)
        push!(SS, sum(abs2, @view(effects[idxs])))
        push!(names, n)
        push!(DF, length(idxs))
    end
    push!(SS, sum(abs2, @view(effects[r+1:end])))
    push!(names, "Residuals")
    push!(DF, length(effects) - r)
    MSS = SS ./ DF
    F = MSS ./ MSS[end]
    F[end] = NaN
    DFres = DF[end]
    result = DataFrame(names = names, DF = DF, SS = SS, MSS = MSS, F = F,
                       p = @.(ccdf(FDist(DF, DFres), F)))
    if return_locals
        result, (qrp = qrp, asgn = asgn, terms = terms)
    else
        result
    end
end
function aoverror(f::FormulaTerm, e, nested::T, data; return_locals = false) where T
    nested = T <: Symbol ? [nested] : nested
    ef = errorterm(e, nested)
    efit = mqr(f.lhs ~ ef, data,
        contrasts = Dict(e => HelmertCoding(),
                         [t => HelmertCoding() for t in nested]...))
    efit.mm.assign = efit.mm.assign[efit.qrp.p[1:rank(efit.qrp.R)]]
    mf = ModelFrame(f, data)
    mm = ModelMatrix(mf)
    ys = efit.qrp.Q' * getproperty(data, f.lhs.sym)
    append!(efit.mm.assign,
            fill(maximum(efit.mm.assign) + 1, length(ys) - length(efit.mm.assign)))
    Xs = efit.qrp.Q' * mm.m
    rows = [findall(efit.mm.assign .== i)
            for i in Iterators.drop(sort!(union(efit.mm.assign)), 1)]
    result = [inner(Xs, ys, row, mm.assign, mf.f.rhs.terms; return_locals)
              for row in rows]
    if return_locals
        first.(result), (Q = efit.qrp.Q,
                         locals_t = [(rest..., row)
                                     for (row, rest) in zip(rows, last.(result))])
    else
        result
    end
end
function aoverror(l, y)
    ys = l.Q' * y
    [aov(qrp, asgn, terms, ys[rows]) for (qrp, asgn, terms, rows) in l.locals_t]
end
function inner(X, y, rows, asgn, terms; return_locals = false)
    T = X[rows, :]
    cols = findall(reshape(sum(abs2, T, dims = 1), :) .> 1e-5)
    qrp = qr(T[:, cols], Val(true))
    aov(qrp, pivoted_asgn(asgn[cols], qrp), terms, y[rows]; return_locals)
end

Applied to your example above the result is

julia> aoverror(@formula(value ~ a * b), :id, [], df)
2-element Vector{DataFrame}:
 1×6 DataFrame
 Row │ names      DF     SS        MSS       F        p       
     │ String     Int64  Float64   Float64   Float64  Float64 
─────┼────────────────────────────────────────────────────────
   1 │ Residuals      1  0.653333  0.653333      NaN      NaN
 4×6 DataFrame
 Row │ names      DF     SS           MSS          F              p          
     │ String     Int64  Float64      Float64      Float64        Float64    
─────┼───────────────────────────────────────────────────────────────────────
   1 │ b              1  6.93335e-33  6.93335e-33    7.93895e-32    1.0
   2 │ a              2  0.121667     0.0608333      0.696565       0.54093
   3 │ a & b          2  0.105        0.0525         0.601145       0.583505
   4 │ Residuals      5  0.436667     0.0873333    NaN            NaN

Topic		Replies	Views
Repeated measures 2 way ANOVA Statistics	4	2256	October 11, 2019
ANOVA Tests in Julia? Statistics	76	14020	August 11, 2022
SimpleANOVA.jl 0.7.0 - Added repeated measures Package Announcements	0	371	January 12, 2021
Equivalent of Rs `aov(Y ~ B + Error(A/B), data=d)` General Usage question , statistics	1	611	March 7, 2021
Using GLM.jl for mixed design Anova Statistics question , glm	4	814	January 22, 2019

Repeated measures ANOVA

Related topics