 # Repeated measures ANOVA

I’m trying to do a repeated measures ANOVA. Lets say the data is as follows

``````julia> using CategoricalArrays, DataFrames, Random

julia> Random.seed!(1)

julia> df = DataFrame(
id = categorical(repeat(1:2; inner=6)),
a = categorical(repeat(1:3, inner=2, outer=2)),
b = categorical(repeat(1:2, 6)),
value = round.(rand(Float64, 12), digits=1)
)
12×4 DataFrame
Row │ id    a     b     value
│ Cat…  Cat…  Cat…  Float64
─────┼───────────────────────────
1 │ 1     1     1         0.2
2 │ 1     1     2         0.3
3 │ 1     2     1         0.3
4 │ 1     2     2         0.0
5 │ 1     3     1         0.5
6 │ 1     3     2         0.2
7 │ 2     1     1         1.0
8 │ 2     1     2         1.0
9 │ 2     2     1         0.3
10 │ 2     2     2         1.0
11 │ 2     3     1         0.6
12 │ 2     3     2         0.4
``````

and lets say I want to have the p-values for different effects such as `a`, `b` and `a*b`. I’ve been looking at the packages mentioned in ANOVA Tests in Julia? - #61 by BioTurboNick, but wasn’t able to reproduce work that I’m trying to reproduce.

For the sake of generality, I would prefer solving this problem by just using GLM.jl, which should be possible according to Repeated Measures and Mixed Models and Common statistical tests are linear models (or: how to teach stats). Also, there is a mention of ANOVA in the GLM documentation for ftest.

The main problem that I have is how to deal with the repeated measures parts in combination with the different effects.

EDIT: I think that the solution is to use the MixedModels package (Repeated measures 2 way ANOVA - #2 by mkborregaard).

EDIT2: No, MixedModels is not it because I need F-values and GLM.ftest doesn’t work on MixedModels.

Hi, can you say more about the trouble you were having with my package?

Hi Nicholas, yeah, it’s actually very easy to reproduce on the example I posted above. Continuing on the previous code block:

``````julia> using SimpleANOVA

julia> anova(df, :value, [:a, :b])
ERROR: MethodError: no method matching *(::CategoricalValue{Int64, UInt32}, ::Int64)
Closest candidates are:
[...]
``````

Currently, I’m looking into your package again, because I realized that it’s probably more suitable to reproduce SPSS results that I got.

@BioTurboNick, how would you use your package to get results for the problem I posed in this thread?

Ah, it’s possible that I just need to generalize something to work with CategoricalArrays. I’ve also found a bug in the df/vector entry point though. I’ll fix the latter shortly.

But it should be as simple as this:

``````anova(df, :value, [:b, :id, :a], [fixed, subject])
``````

for the case where `id` is specifying your subjects, `a` is an among-subjects fixed factor, and `b` is a within-subjects fixed factor. (Assumes the rightmost factors are fixed if unspecified)

Realized my docstring doesn’t show how to work with DataFrames, so I’ll fix that too.

2 Likes

Thank you for your example. It is helpful, because I didn’t realize how to pass the `[fixed, subject]` yet.

Is that the following error?

``````julia> df = DataFrame(
id = repeat(1:2; inner=6),
a = repeat(1:3, inner=2, outer=2),
b = repeat(1:2, 6),
value = round.(rand(Float64, 12), digits=1)
)

julia> anova(df, :value, [:b, :id, :a], [fixed, subject])
ERROR: factornames must have an entry for each factor.
[...]
``````
1 Like

And, unfortunately, when I do the same as above on the real data that I have (and cannot share), I get

``````InexactError: Int64(1.5)
Int64(::Float64)@float.jl:72
var"#anova#65"(::Vector{String}, ::typeof(SimpleANOVA.anova), ::Vector{Float64}, ::Vector{Vector{Int64}}, ::Vector{SimpleANOVA.FactorType})@anova.jl:86
var"#anova#156"(::Vector{String}, ::typeof(SimpleANOVA.anova), ::DataFrames.DataFrame, ::Symbol, ::Vector{Symbol}, ::Vector{SimpleANOVA.FactorType})@anova_dataframes.jl:8
anova@anova_dataframes.jl:4[inlined]
``````

So, that is this line. Maybe, the `Int` conversion is an assumption that one cannot make here, but I’m not sure and a bit at a loss here.

Already fixed, I think, just making sure I didn’t break anything before submitting to the registry 1 Like

Yes, nice. The `factornames must have an entry for each factor` is now fixed on `master`.

EDIT: The `Int` conversion error still occurs.

The Int error is occurring because something isn’t adding up in the input data. It’s expecting a balanced design with equal numbers of all factor levels. I improved the check.

1 Like

Thanks a lot for looking into it! Now, I have a nice “Design is unbalanced.” error instead of the Int error, so I guess that is better indeed.

Anyway, I’m done with the ANOVA problems for today. I hope that I’ll send in a nice Documenter.jl PR for SimpleANOVA tonight so that I have been productive today (it’s 19:18 in the Netherlands now). Otherwise, I’ll send it tomorrow.

I think it’s gonna be nice, but of course whether you want to merge it is gonna be up to you

1 Like

That’d be great, thanks!

I implemented R’s `aov` function in julia some time ago. It can handle repeated measures, also when the data is unbalanced. So far the interface is ugly and I didn’t publish it, but I can do so if there is some interest.

Here is my source code, if you want to give it a try.

aov code
``````using StatsModels, LinearAlgebra, Distributions

toterm(s1::Term, s2::Term) = InteractionTerm((s1, s2))
toterm(s1::InteractionTerm, s2::Term) = InteractionTerm((s1.terms..., s2))
toterm(s1::Term, s2::InteractionTerm) = InteractionTerm((s1, s2.terms...))
toterm(x::Tuple, s2::Term) = map(t -> toterm(t, s2), x)
toterm(s1::Term, x::Tuple) = map(t -> toterm(s1, t), x)
function predictor_product(predictors)
init, rest = Iterators.peel(predictors)
reduce((x, y) -> x + y + toterm(x, y), term.(rest), init = term(init))
end
function errorterm(ef, nested)
terms = term(1) + term(ef)
if length(nested) > 0
nestedterms = predictor_product(nested)
terms += toterm(term(ef), nestedterms)
end
terms
end

name(t::AbstractTerm) = "\$(t.sym)"
name(t::InteractionTerm) = join(name.(t.terms), " & ")
name(t::InterceptTerm) = "Intercept"
pivoted_asgn(assign, qrp) = assign[qrp.p[1:rank(qrp.R)]]
function mqr(f::FormulaTerm, data; contrasts = Dict{Symbol, Any}())
mf = ModelFrame(f, data; contrasts)
mm = ModelMatrix(mf)
qrp = qr(mm.m, Val(true))
(qrp = qrp, mf = mf, mm = mm)
end
function aov(f::FormulaTerm, data; return_locals = false)
qrp, mf, mm = mqr(f, data)
aov(qrp, pivoted_asgn(mm.assign, qrp), mf.f.rhs.terms,
getproperty(data, f.lhs.sym); return_locals)
end
aov(l, y) = aov(l..., y)
function aov(qrp, asgn, terms, y; return_locals = false)
effects = qrp.Q' * y
r = rank(qrp.R)
SS = Float64[]
names = String[]
DF = Int[]
for i in union(asgn)
n = name(terms[i])
n == "Intercept" && continue
idxs = findall(asgn .== i)
push!(SS, sum(abs2, @view(effects[idxs])))
push!(names, n)
push!(DF, length(idxs))
end
push!(SS, sum(abs2, @view(effects[r+1:end])))
push!(names, "Residuals")
push!(DF, length(effects) - r)
MSS = SS ./ DF
F = MSS ./ MSS[end]
F[end] = NaN
DFres = DF[end]
result = DataFrame(names = names, DF = DF, SS = SS, MSS = MSS, F = F,
p = @.(ccdf(FDist(DF, DFres), F)))
if return_locals
result, (qrp = qrp, asgn = asgn, terms = terms)
else
result
end
end
function aoverror(f::FormulaTerm, e, nested::T, data; return_locals = false) where T
nested = T <: Symbol ? [nested] : nested
ef = errorterm(e, nested)
efit = mqr(f.lhs ~ ef, data,
contrasts = Dict(e => HelmertCoding(),
[t => HelmertCoding() for t in nested]...))
efit.mm.assign = efit.mm.assign[efit.qrp.p[1:rank(efit.qrp.R)]]
mf = ModelFrame(f, data)
mm = ModelMatrix(mf)
ys = efit.qrp.Q' * getproperty(data, f.lhs.sym)
append!(efit.mm.assign,
fill(maximum(efit.mm.assign) + 1, length(ys) - length(efit.mm.assign)))
Xs = efit.qrp.Q' * mm.m
rows = [findall(efit.mm.assign .== i)
for i in Iterators.drop(sort!(union(efit.mm.assign)), 1)]
result = [inner(Xs, ys, row, mm.assign, mf.f.rhs.terms; return_locals)
for row in rows]
if return_locals
first.(result), (Q = efit.qrp.Q,
locals_t = [(rest..., row)
for (row, rest) in zip(rows, last.(result))])
else
result
end
end
function aoverror(l, y)
ys = l.Q' * y
[aov(qrp, asgn, terms, ys[rows]) for (qrp, asgn, terms, rows) in l.locals_t]
end
function inner(X, y, rows, asgn, terms; return_locals = false)
T = X[rows, :]
cols = findall(reshape(sum(abs2, T, dims = 1), :) .> 1e-5)
qrp = qr(T[:, cols], Val(true))
aov(qrp, pivoted_asgn(asgn[cols], qrp), terms, y[rows]; return_locals)
end
``````

Applied to your example above the result is

``````julia> aoverror(@formula(value ~ a * b), :id, [], df)
2-element Vector{DataFrame}:
1×6 DataFrame
Row │ names      DF     SS        MSS       F        p
│ String     Int64  Float64   Float64   Float64  Float64
─────┼────────────────────────────────────────────────────────
1 │ Residuals      1  0.653333  0.653333      NaN      NaN
4×6 DataFrame
Row │ names      DF     SS           MSS          F              p
│ String     Int64  Float64      Float64      Float64        Float64
─────┼───────────────────────────────────────────────────────────────────────
1 │ b              1  6.93335e-33  6.93335e-33    7.93895e-32    1.0
2 │ a              2  0.121667     0.0608333      0.696565       0.54093
3 │ a & b          2  0.105        0.0525         0.601145       0.583505
4 │ Residuals      5  0.436667     0.0873333    NaN            NaN
``````
4 Likes