I am trying to write a loop for a logistic regession. I have a dataframe with about 20 columns and I would like to use one of the columns after each other and print the coeftable afterwards. In the end I am aiming to make a stepwise logistic regression.
I’ve got this example code which performs well:
using DataFrames, GLM
srand(42)
df_test = DataFrame(A = rand(Int, 100), B = rand(Int, 100), C = rand(0:1, 100) )
model_test = glm(@formula(C ~ A + B),
df_test, Binomial(), LogitLink())
println(coeftable(model_test))
But I would like to get all the column names (df_names) and than tell the model to use this names as an input (df_names[1]):
Non-call expression encountered
in glm at GLM\src\glmfit.jl:286
in #glm#13 at GLM\src\glmfit.jl:286
in fit at DataFrames\src\statsmodels\statsmodel.jl:52
in #fit#153 at DataFrames\src\statsmodels\statsmodel.jl:52
in at base\<missing>
in #ModelFrame#127 at DataFrames\src\statsmodels\formula.jl:333
in DataFrames.Terms at DataFrames\src\statsmodels\formula.jl:209
in dospecials at DataFrames\src\statsmodels\formula.jl:101
in map at base\abstractarray.jl:1868
in _collect at base\array.jl:488
in next at base\generator.jl:45
in dospecials at DataFrames\src\statsmodels\formula.jl:97
so I would like to make df_names[1] callable. Does anybody has a suggestion how to make it callable?
And does anybody has an idea how to performe a stepwise logistic regression?
This is clearly a situation that arises due to the use of the @formula macro. Is it possible to make formulas work directly with vectors? Something like:
# Following the notation from the question
glm(df_test[3] ~ df_test[1] + df_test[2])
For what I understand, you can create pass your own model matrix and your response variable to glm, but obviously that’s more complicated for some users.
You don’t need a macro for it. Here is a minimal example of one approach you could use
using DataFrames, GLM, Random
Random.seed!(0)
data = DataFrame(A = rand(0:1, 10), B = rand(10), C = rand(10), D = rand(10))
function step_wise(vars)
model_formula = @formula(A ~ B + C)
for var ∈ vars
model_formula.rhs.args[3] = var
model = glm(model_formula, data, Binomial(), LogitLink())
println(coeftable(model))
end
end
step_wise(names(data)[3:4])