Stepwise logistic regress - GLM - non-callable --> callable (Non-call expression encountered)

Hello,

I am trying to write a loop for a logistic regession. I have a dataframe with about 20 columns and I would like to use one of the columns after each other and print the coeftable afterwards. In the end I am aiming to make a stepwise logistic regression.

I’ve got this example code which performs well:

using DataFrames, GLM

srand(42)
df_test = DataFrame(A = rand(Int, 100), B = rand(Int, 100), C = rand(0:1, 100) )

model_test = glm(@formula(C ~ A + B),
 df_test, Binomial(), LogitLink())
println(coeftable(model_test))

But I would like to get all the column names (df_names) and than tell the model to use this names as an input (df_names[1]):

df_names = names(df_test)

model_test = glm(@formula(C ~ df_names[1] + df_names[2]),
 df_test, Binomial(), LogitLink())

when I execute the code I get this error message:

Non-call expression encountered
in glm at GLM\src\glmfit.jl:286
in #glm#13 at GLM\src\glmfit.jl:286
in fit at DataFrames\src\statsmodels\statsmodel.jl:52
in #fit#153 at DataFrames\src\statsmodels\statsmodel.jl:52
in  at base\<missing>
in #ModelFrame#127 at DataFrames\src\statsmodels\formula.jl:333
in DataFrames.Terms at DataFrames\src\statsmodels\formula.jl:209
in dospecials at DataFrames\src\statsmodels\formula.jl:101
in map at base\abstractarray.jl:1868
in _collect at base\array.jl:488
in next at base\generator.jl:45
in dospecials at DataFrames\src\statsmodels\formula.jl:97

so I would like to make df_names[1] callable. Does anybody has a suggestion how to make it callable?
And does anybody has an idea how to performe a stepwise logistic regression?

Cheers!
Tobi

thank you for your response.
unfortunately when I do it like you said I get the same error again

model_test = glm(@formula(C ~ df_test[:df_names[1]] + df_test[:df_names[2]]),
 df_test, Binomial(), LogitLink())

when I just execute:

df_test[:df_names[1]]

I get this error:

MethodError: no method matching getindex(::Symbol, ::Int64)

but when I delete the colon than i get the DataArray

df_test[df_names[1]]

but deleting the colon in the model doesn’t help either.

You need to use @eval or eval(Meta.parse(...)) on a string that you build. See https://juliastats.github.io/StatsModels.jl/latest/formula.html#Constructing-a-formula-programmatically-1.

This is one of those areas that is ripe for a helper package. I currently do this:

function expandargs(x)
    :(+$(x...))
end
rhs = [:x1, :x2, :x3]
lhs = :y
@eval @formula($lhs ~ $(expandargs(rhs)))

This is clearly a situation that arises due to the use of the @formula macro. Is it possible to make formulas work directly with vectors? Something like:

# Following the notation from the question
glm(df_test[3] ~ df_test[1] + df_test[2])

For what I understand, you can create pass your own model matrix and your response variable to glm, but obviously that’s more complicated for some users.

You don’t need a macro for it. Here is a minimal example of one approach you could use

using DataFrames, GLM, Random

Random.seed!(0)
data = DataFrame(A = rand(0:1, 10), B = rand(10), C = rand(10), D = rand(10))

function step_wise(vars)
    model_formula = @formula(A ~ B + C)
    for var ∈ vars
        model_formula.rhs.args[3] = var
        model = glm(model_formula, data, Binomial(), LogitLink())
        println(coeftable(model))
    end
end
step_wise(names(data)[3:4])

thank you for your answers.

@Nosferican your solution works very well.

the hint to make it callable with @eval was also very helpful.

If one makes it callable with @eval don’t forget the bracket after $(…

model_test = @eval glm(@formula(C ~ $(df_names[1]) + B),
 df_test, Binomial(), LogitLink())

We could provide convenience functions to make this easier, and/or improve docs.

1 Like