Hi group,

I have a dataframe of >10000 rows of repeated measurements on 25 subjects (OB) on 3 categorical factors (QF,B,PA) and 1 continuous (CCT) factor. The first few lines look like this:

│ Row │ OB │ QF │ BG │ PA │ CCT │ VAL │

├─────┼─────┼─────┼─────┼─────

│ 1 │ “1” │ “B” │ “B” │ “A” │ 3000.0 │ 6.0 │

│ 2 │ “1” │ “B” │ “B” │ “A” │ 3500.0 │ 8.0 │

│ 3 │ “1” │ “B” │ “B” │ “A” │ 4000.0 │ 8.0 │

│ 4 │ “1” │ “B” │ “B” │ “A” │ 5000.0 │ 2.0 │

│ 5 │ “1” │ “B” │ “B” │ “A” │ 6000.0 │ 1.0 │

│ 6 │ “1” │ “B” │ “B” │ “R” │ 3000.0 │ 1.0 │

dump(df) gives:

DataFrames.DataFrame 11250 observations of 6 variables

OB: DataArrays.PooledDataArray{String,UInt8,1}(11250)

String[“1”, “1”, “1”, “1”]

QF: DataArrays.PooledDataArray{String,UInt8,1}(11250) String[“B”, “B”, “B”, “B”]

BG: DataArrays.PooledDataArray{String,UInt8,1}(11250) String[“B”, “B”, “B”, “B”]

PA: DataArrays.PooledDataArray{String,UInt8,1}(11250) String[“A”, “A”, “A”, “A”]

CCT: DataArrays.DataArray{Float64,1}(11250)

[3000.0, 3500.0, 4000.0, 5000.0]

VAL: DataArrays.DataArray{Float64,1}(11250) [6.0, 8.0, 8.0, 2.0]

I want to find for each of the factors (3 categorical and 1 continuous) whether they have a significant impact on VAL.

If I run:

m = fit!(lmm(@formula(VAL ~ QF * PA * BG * CCT+ (QF + PA + BG + CCT | OB)), df))

I get for each of the fixed effects (and interactions) the p-values for the contrast between each the individual factor levels with the first factor level:

…

Fixed-effects parameters:

Estimate Std.Error z value P(>|z|)

(Intercept) 6.90138 0.712937 9.68021 <1e-21

QF: BR -5.35172 0.895662 -5.97516 <1e-8

QF: CA 0.524138 0.843195 0.621609 0.5342

QF: OA 0.355862 0.847164 0.420063 0.6744

QF: V 0.164483 0.845997 0.194425 0.8458

QF: W 6.28 0.873654 7.1882 <1e-12

PA: B -0.32 0.842267 -0.379927 0.7040

PA: G -2.64345 0.848704 -3.11469 0.0018

PA: R -2.64724 0.859032 -3.08166 0.0021

PA: Y -0.757241 0.847848 -0.893134 0.3718

BG: G -1.30621 0.845539 -1.54482 0.1224

BG: W -1.27931 0.850409 -1.50435 0.1325

CCT -0.000172414 0.000160561 -1.07382 0.2829

QF: BR & PA: B 0.222069 1.19066 0.18651 0.8520

…

Although I could conclude that (after e.g. a Bonferonni correction) a factor is significant when at least one contrast is significant, it could always be that by chance the contrasts with the first level would not be. It also does not provide me with p-values for the significance of the total factor.

Is there a good way to do this repeated measures analysis? Can I actually use mixed linear models? What should I change in the above model fit?

Thanks!