Hi group,
I have a dataframe of >10000 rows of repeated measurements on 25 subjects (OB) on 3 categorical factors (QF,B,PA) and 1 continuous (CCT) factor. The first few lines look like this:
│ Row │ OB │ QF │ BG │ PA │ CCT │ VAL │
├─────┼─────┼─────┼─────┼─────
│ 1 │ “1” │ “B” │ “B” │ “A” │ 3000.0 │ 6.0 │
│ 2 │ “1” │ “B” │ “B” │ “A” │ 3500.0 │ 8.0 │
│ 3 │ “1” │ “B” │ “B” │ “A” │ 4000.0 │ 8.0 │
│ 4 │ “1” │ “B” │ “B” │ “A” │ 5000.0 │ 2.0 │
│ 5 │ “1” │ “B” │ “B” │ “A” │ 6000.0 │ 1.0 │
│ 6 │ “1” │ “B” │ “B” │ “R” │ 3000.0 │ 1.0 │
dump(df) gives:
DataFrames.DataFrame 11250 observations of 6 variables
OB: DataArrays.PooledDataArray{String,UInt8,1}(11250)
String[“1”, “1”, “1”, “1”]
QF: DataArrays.PooledDataArray{String,UInt8,1}(11250) String[“B”, “B”, “B”, “B”]
BG: DataArrays.PooledDataArray{String,UInt8,1}(11250) String[“B”, “B”, “B”, “B”]
PA: DataArrays.PooledDataArray{String,UInt8,1}(11250) String[“A”, “A”, “A”, “A”]
CCT: DataArrays.DataArray{Float64,1}(11250)
[3000.0, 3500.0, 4000.0, 5000.0]
VAL: DataArrays.DataArray{Float64,1}(11250) [6.0, 8.0, 8.0, 2.0]
I want to find for each of the factors (3 categorical and 1 continuous) whether they have a significant impact on VAL.
If I run:
m = fit!(lmm(@formula(VAL ~ QF * PA * BG * CCT+ (QF + PA + BG + CCT | OB)), df))
I get for each of the fixed effects (and interactions) the p-values for the contrast between each the individual factor levels with the first factor level:
…
Fixed-effects parameters:
Estimate Std.Error z value P(>|z|)
(Intercept) 6.90138 0.712937 9.68021 <1e-21
QF: BR -5.35172 0.895662 -5.97516 <1e-8
QF: CA 0.524138 0.843195 0.621609 0.5342
QF: OA 0.355862 0.847164 0.420063 0.6744
QF: V 0.164483 0.845997 0.194425 0.8458
QF: W 6.28 0.873654 7.1882 <1e-12
PA: B -0.32 0.842267 -0.379927 0.7040
PA: G -2.64345 0.848704 -3.11469 0.0018
PA: R -2.64724 0.859032 -3.08166 0.0021
PA: Y -0.757241 0.847848 -0.893134 0.3718
BG: G -1.30621 0.845539 -1.54482 0.1224
BG: W -1.27931 0.850409 -1.50435 0.1325
CCT -0.000172414 0.000160561 -1.07382 0.2829
QF: BR & PA: B 0.222069 1.19066 0.18651 0.8520
…
Although I could conclude that (after e.g. a Bonferonni correction) a factor is significant when at least one contrast is significant, it could always be that by chance the contrasts with the first level would not be. It also does not provide me with p-values for the significance of the total factor.
Is there a good way to do this repeated measures analysis? Can I actually use mixed linear models? What should I change in the above model fit?
Thanks!