Repeated measures ANOVA

I’m trying to do a repeated measures ANOVA. Lets say the data is as follows

julia> using CategoricalArrays, DataFrames, Random

julia> Random.seed!(1)

julia> df = DataFrame(
          id = categorical(repeat(1:2; inner=6)),
          a = categorical(repeat(1:3, inner=2, outer=2)),
          b = categorical(repeat(1:2, 6)),
          value = round.(rand(Float64, 12), digits=1)
       ) 
12×4 DataFrame
 Row │ id    a     b     value
     │ Cat…  Cat…  Cat…  Float64
─────┼───────────────────────────
   1 │ 1     1     1         0.2
   2 │ 1     1     2         0.3
   3 │ 1     2     1         0.3
   4 │ 1     2     2         0.0
   5 │ 1     3     1         0.5
   6 │ 1     3     2         0.2
   7 │ 2     1     1         1.0
   8 │ 2     1     2         1.0
   9 │ 2     2     1         0.3
  10 │ 2     2     2         1.0
  11 │ 2     3     1         0.6
  12 │ 2     3     2         0.4

and lets say I want to have the p-values for different effects such as a, b and a*b. I’ve been looking at the packages mentioned in ANOVA Tests in Julia? - #61 by BioTurboNick, but wasn’t able to reproduce work that I’m trying to reproduce.

For the sake of generality, I would prefer solving this problem by just using GLM.jl, which should be possible according to Repeated Measures and Mixed Models and Common statistical tests are linear models (or: how to teach stats). Also, there is a mention of ANOVA in the GLM documentation for ftest.

The main problem that I have is how to deal with the repeated measures parts in combination with the different effects.

EDIT: I think that the solution is to use the MixedModels package (Repeated measures 2 way ANOVA - #2 by mkborregaard).

EDIT2: No, MixedModels is not it because I need F-values and GLM.ftest doesn’t work on MixedModels.

Hi, can you say more about the trouble you were having with my package?

Hi Nicholas, yeah, it’s actually very easy to reproduce on the example I posted above. Continuing on the previous code block:

julia> using SimpleANOVA

julia> anova(df, :value, [:a, :b])
ERROR: MethodError: no method matching *(::CategoricalValue{Int64, UInt32}, ::Int64)
Closest candidates are:
[...]

Currently, I’m looking into your package again, because I realized that it’s probably more suitable to reproduce SPSS results that I got.

@BioTurboNick, how would you use your package to get results for the problem I posed in this thread?

Ah, it’s possible that I just need to generalize something to work with CategoricalArrays. I’ve also found a bug in the df/vector entry point though. I’ll fix the latter shortly.

But it should be as simple as this:

anova(df, :value, [:b, :id, :a], [fixed, subject])

for the case where id is specifying your subjects, a is an among-subjects fixed factor, and b is a within-subjects fixed factor. (Assumes the rightmost factors are fixed if unspecified)

Realized my docstring doesn’t show how to work with DataFrames, so I’ll fix that too.

2 Likes

Thank you for your example. It is helpful, because I didn’t realize how to pass the [fixed, subject] yet.

Is that the following error?

julia> df = DataFrame(
          id = repeat(1:2; inner=6),
          a = repeat(1:3, inner=2, outer=2),
          b = repeat(1:2, 6),
          value = round.(rand(Float64, 12), digits=1)
       )

julia> anova(df, :value, [:b, :id, :a], [fixed, subject])
ERROR: factornames must have an entry for each factor.
[...]
1 Like

And, unfortunately, when I do the same as above on the real data that I have (and cannot share), I get

InexactError: Int64(1.5)
Int64(::Float64)@float.jl:72
var"#anova#65"(::Vector{String}, ::typeof(SimpleANOVA.anova), ::Vector{Float64}, ::Vector{Vector{Int64}}, ::Vector{SimpleANOVA.FactorType})@anova.jl:86
var"#anova#156"(::Vector{String}, ::typeof(SimpleANOVA.anova), ::DataFrames.DataFrame, ::Symbol, ::Vector{Symbol}, ::Vector{SimpleANOVA.FactorType})@anova_dataframes.jl:8
anova@anova_dataframes.jl:4[inlined]

So, that is this line. Maybe, the Int conversion is an assumption that one cannot make here, but I’m not sure and a bit at a loss here.

Already fixed, I think, just making sure I didn’t break anything before submitting to the registry :slight_smile:

1 Like

Yes, nice. The factornames must have an entry for each factor is now fixed on master.

EDIT: The Int conversion error still occurs.

The Int error is occurring because something isn’t adding up in the input data. It’s expecting a balanced design with equal numbers of all factor levels. I improved the check.

1 Like

Thanks a lot for looking into it! Now, I have a nice “Design is unbalanced.” error instead of the Int error, so I guess that is better indeed.

Anyway, I’m done with the ANOVA problems for today. I hope that I’ll send in a nice Documenter.jl PR for SimpleANOVA tonight so that I have been productive today (it’s 19:18 in the Netherlands now). Otherwise, I’ll send it tomorrow.

I think it’s gonna be nice, but of course whether you want to merge it is gonna be up to you

1 Like

That’d be great, thanks!

I implemented R’s aov function in julia some time ago. It can handle repeated measures, also when the data is unbalanced. So far the interface is ugly and I didn’t publish it, but I can do so if there is some interest.

Here is my source code, if you want to give it a try.

aov code
using StatsModels, LinearAlgebra, Distributions

toterm(s1::Term, s2::Term) = InteractionTerm((s1, s2))
toterm(s1::InteractionTerm, s2::Term) = InteractionTerm((s1.terms..., s2))
toterm(s1::Term, s2::InteractionTerm) = InteractionTerm((s1, s2.terms...))
toterm(x::Tuple, s2::Term) = map(t -> toterm(t, s2), x)
toterm(s1::Term, x::Tuple) = map(t -> toterm(s1, t), x)
function predictor_product(predictors)
    init, rest = Iterators.peel(predictors)
    reduce((x, y) -> x + y + toterm(x, y), term.(rest), init = term(init))
end
function errorterm(ef, nested)
    terms = term(1) + term(ef)
    if length(nested) > 0
        nestedterms = predictor_product(nested)
        terms += toterm(term(ef), nestedterms)
    end
    terms
end

name(t::AbstractTerm) = "$(t.sym)"
name(t::InteractionTerm) = join(name.(t.terms), " & ")
name(t::InterceptTerm) = "Intercept"
pivoted_asgn(assign, qrp) = assign[qrp.p[1:rank(qrp.R)]]
function mqr(f::FormulaTerm, data; contrasts = Dict{Symbol, Any}())
    mf = ModelFrame(f, data; contrasts)
    mm = ModelMatrix(mf)
    qrp = qr(mm.m, Val(true))
    (qrp = qrp, mf = mf, mm = mm)
end
function aov(f::FormulaTerm, data; return_locals = false)
    qrp, mf, mm = mqr(f, data)
    aov(qrp, pivoted_asgn(mm.assign, qrp), mf.f.rhs.terms,
        getproperty(data, f.lhs.sym); return_locals)
end
aov(l, y) = aov(l..., y)
function aov(qrp, asgn, terms, y; return_locals = false)
    effects = qrp.Q' * y
    r = rank(qrp.R)
    SS = Float64[]
    names = String[]
    DF = Int[]
    for i in union(asgn)
        n = name(terms[i])
        n == "Intercept" && continue
        idxs = findall(asgn .== i)
        push!(SS, sum(abs2, @view(effects[idxs])))
        push!(names, n)
        push!(DF, length(idxs))
    end
    push!(SS, sum(abs2, @view(effects[r+1:end])))
    push!(names, "Residuals")
    push!(DF, length(effects) - r)
    MSS = SS ./ DF
    F = MSS ./ MSS[end]
    F[end] = NaN
    DFres = DF[end]
    result = DataFrame(names = names, DF = DF, SS = SS, MSS = MSS, F = F,
                       p = @.(ccdf(FDist(DF, DFres), F)))
    if return_locals
        result, (qrp = qrp, asgn = asgn, terms = terms)
    else
        result
    end
end
function aoverror(f::FormulaTerm, e, nested::T, data; return_locals = false) where T
    nested = T <: Symbol ? [nested] : nested
    ef = errorterm(e, nested)
    efit = mqr(f.lhs ~ ef, data,
        contrasts = Dict(e => HelmertCoding(),
                         [t => HelmertCoding() for t in nested]...))
    efit.mm.assign = efit.mm.assign[efit.qrp.p[1:rank(efit.qrp.R)]]
    mf = ModelFrame(f, data)
    mm = ModelMatrix(mf)
    ys = efit.qrp.Q' * getproperty(data, f.lhs.sym)
    append!(efit.mm.assign,
            fill(maximum(efit.mm.assign) + 1, length(ys) - length(efit.mm.assign)))
    Xs = efit.qrp.Q' * mm.m
    rows = [findall(efit.mm.assign .== i)
            for i in Iterators.drop(sort!(union(efit.mm.assign)), 1)]
    result = [inner(Xs, ys, row, mm.assign, mf.f.rhs.terms; return_locals)
              for row in rows]
    if return_locals
        first.(result), (Q = efit.qrp.Q,
                         locals_t = [(rest..., row)
                                     for (row, rest) in zip(rows, last.(result))])
    else
        result
    end
end
function aoverror(l, y)
    ys = l.Q' * y
    [aov(qrp, asgn, terms, ys[rows]) for (qrp, asgn, terms, rows) in l.locals_t]
end
function inner(X, y, rows, asgn, terms; return_locals = false)
    T = X[rows, :]
    cols = findall(reshape(sum(abs2, T, dims = 1), :) .> 1e-5)
    qrp = qr(T[:, cols], Val(true))
    aov(qrp, pivoted_asgn(asgn[cols], qrp), terms, y[rows]; return_locals)
end

Applied to your example above the result is

julia> aoverror(@formula(value ~ a * b), :id, [], df)
2-element Vector{DataFrame}:
 1×6 DataFrame
 Row │ names      DF     SS        MSS       F        p       
     │ String     Int64  Float64   Float64   Float64  Float64 
─────┼────────────────────────────────────────────────────────
   1 │ Residuals      1  0.653333  0.653333      NaN      NaN
 4×6 DataFrame
 Row │ names      DF     SS           MSS          F              p          
     │ String     Int64  Float64      Float64      Float64        Float64    
─────┼───────────────────────────────────────────────────────────────────────
   1 │ b              1  6.93335e-33  6.93335e-33    7.93895e-32    1.0
   2 │ a              2  0.121667     0.0608333      0.696565       0.54093
   3 │ a & b          2  0.105        0.0525         0.601145       0.583505
   4 │ Residuals      5  0.436667     0.0873333    NaN            NaN
4 Likes