Non-call expression encountered

New to Julia, coming from matlab.

I’m tying to run a regression using a dataframe but I want to use a specific range of the dataframe as I have many covariates.

So instead of running the following

df_test = DataFrame(A = rand(Int, 100), B = rand(Int, 100), C = rand(0:1, 100) )

model_test = glm(@formula(C ~ A + B+C),
 df_test, Binomial(), LogitLink())  

I would like to do like (in a wrong syntax reminescent of matlab):

model_test = glm(@formula(C ~ A + df_test[:,2:end]),
 df_test, Binomial(), LogitLink()) 

Macroexpanding @formula shows that it just creates a call of ~, + etc on symbolic representations of Term objects, i.e.,

julia> @macroexpand @formula C ~ A + B
:(StatsModels.Term(:C) ~ StatsModels.Term(:A) + StatsModels.Term(:B))

Thus, we can just construct the desired expression directly using functions only

julia> using StatsModels

julia> my_formula = Term(:C) ~ +( (Term(Symbol(x)) for x ∈ names(df_test)[2:end])...)
FormulaTerm
Response:
  C(unknown)
Predictors:
  B(unknown)
  C(unknown)

julia> model_test = glm(my_formula, df_test, Binomial(), LogitLink())
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, LogitLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}, Matrix{Float64}}

C ~ 1 + B + C

Alternatively, you can just pass the data as a design matrix and a target vector directly:

julia> model_test = glm(Matrix(df_test[:, 2:end]), df_test[:, :C], Binomial(), LogitLink())
GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, LogitLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}:

In any case, you probably want 1:end-1 as otherwise C is regressed on C.

1 Like

Shorter rhs is sum(term.(names(df)[:, 2:end])) (or if you want to make column selection a bit more robust names(df[:, Not(:C)]))