Using GLM programmatically

I would like to perform linear regression in a loop with variables. For example,

using DataFrames, GLM

data = DataFrame(X=[1,2,3], Y=[2,4,7])
var1 = :Y 
var2 = :X 
ols = lm(@formula($var1 ~ $var2), data)

I tried various approaches including the one proposed here, which no longer works.

1 Like

Just add @eval in front

ols = lm(@eval(@formula($var1 ~ $var2)), data)

Thank you! I tried something similar:

ols = lm(@eval @formula($var1 ~ $var2) , data)

I didn’t realize the parentheses were necessary.

Ah, if you don’t add parenthesis then @eval @formula($var1 ~ $var2) , data doesn’t make sense for @eval, cos of the , which is captured by @eval.

1 Like

Thanks for the explanation. At some point I should really learn how macros work.

1 Like

Just to add that the recommended approach to programmatically generating formula terms is by constructing the Term objects directly, as referenced in the StatsModels API here: Modeling tabular data · StatsModels.jl

8 Likes

I have a question: I need to have a squared term in the formula. I got the thing with log in the example here:

julia> log_term(t::AbstractTerm) = FunctionTerm(log, [t], :(log($(t))))

julia> lt = log_term(term(:a))
(a)->log(a)

julia> lt.f(9)
2.1972245773362196

but how do I use this to make a squared term?

julia> square_term(t::AbstractTerm) = FunctionTerm(^, [t], :(^($t,2)))
square_term (generic function with 2 methods)

julia> st = square_term(term(:b))
(b)->b ^ 2

julia> st.f(2)
ERROR: MethodError: no method matching ^(::Int64)

Closest candidates are:
  ^(::Integer, ::BigInt)
   @ Base gmp.jl:655
  ^(::Integer, ::Bool)
   @ Base bool.jl:170
  ^(::T, ::T) where T<:Integer
   @ Base intfuncs.jl:310
  ...

Stacktrace:
 [1] top-level scope
   @ REPL[135]:1

julia> st.f(2,3)
8

I don’t get it.

I can have a quick look later to see if I can figure it out, but isn’t that the example in the docs?

https://juliastats.org/StatsModels.jl/stable/internals/#An-example-of-custom-syntax:-poly

yes that looks exactly like what I need. I think we should add the sentence to this example with the log function - I had no incentive to look any further, given that example.

this is going to work only in cases with non-ambigous dispatch, for example, the log example below. For functions with multiple arguments you will need to extent the formula syntax as explained there

1 Like