I am trying to iteratively fit many models (e.g. a linear model [GLM.jl] or mixed model [MixedModels.jl]) to a DataFrame, whereby a new model is fit to each level of a grouping column. For example: if I wanted to fit one model to the data for rep1
and another model to rep2
from the grp
column…
# Load packages
using DataFrames, GLM, StatsBase
# Simulate fake data
data = DataFrame(
# Response variable
y =[1, 2, 3, 4, 2, 4, 7, 8],
# Predictor variable
x=["A1", "A2", "A1", "A2", "A1", "A2", "A1", "A2",],
# Grouping variable
grp = ["rep1", "rep1", "rep1", "rep1", "rep2", "rep2", "rep2", "rep2"]
);
data
I can fit a model to the full dataset.
GLM.lm(@formula(y ~ x), data)
But, I can’t seem to do this for each level of grp
.
# One approach
grouped_data = groupby(data, [:grp])
for grp in grouped_data
result = GLM.lm(@formula(y ~ x), grouped_data)
end
# Another approach
results = [lm((@eval @formula(y ~ x)), grouped_data) for grp in grouped_data]
Both approaches throw the following error:
ArgumentError: expected data in a Table, got GroupedDataFrame{DataFrame}
I realise that this error is saying that the data input (grouped_data
) is not in the correct format to be readable by the lm
function call. I am not sure how to pass the grouped data as a table though? Maybe storing the grouped datasets in a list and iterating over the list would be better?
I have seen this [How to save results for each outcome? - #10 by nilshg] on how to iterate over different columns. However, I can’t seem to figure out how to translate this to iteratve over groups in a column.
Ultimately, I would like to produce a table with co-efficients from the fitted models, e.g.
results
2×2 DataFrame
Row │ grp beta1
│ Symbol Float64
─── ┼────────────
1 │ rep1 5.17577
2 │ rep2 0.97570
Any advice or resources would be much appreciated.