Using GLM programmatically

I would like to perform linear regression in a loop with variables. For example,

using DataFrames, GLM

data = DataFrame(X=[1,2,3], Y=[2,4,7])
var1 = :Y 
var2 = :X 
ols = lm(@formula($var1 ~ $var2), data)

I tried various approaches including the one proposed here, which no longer works.

Just add @eval in front

ols = lm(@eval(@formula($var1 ~ $var2)), data)

Thank you! I tried something similar:

ols = lm(@eval @formula($var1 ~ $var2) , data)

I didn’t realize the parentheses were necessary.

Ah, if you don’t add parenthesis then @eval @formula($var1 ~ $var2) , data doesn’t make sense for @eval, cos of the , which is captured by @eval.

Thanks for the explanation. At some point I should really learn how macros work.

Just to add that the recommended approach to programmatically generating formula terms is by constructing the Term objects directly, as referenced in the StatsModels API here: Modeling tabular data · StatsModels.jl

I have a question: I need to have a squared term in the formula. I got the thing with log in the example here:

julia> log_term(t::AbstractTerm) = FunctionTerm(log, [t], :(log($(t))))

julia> lt = log_term(term(:a))
(a)->log(a)

julia> lt.f(9)
2.1972245773362196

but how do I use this to make a squared term?

julia> square_term(t::AbstractTerm) = FunctionTerm(^, [t], :(^($t,2)))
square_term (generic function with 2 methods)

julia> st = square_term(term(:b))
(b)->b ^ 2

julia> st.f(2)
ERROR: MethodError: no method matching ^(::Int64)

Closest candidates are:
  ^(::Integer, ::BigInt)
   @ Base gmp.jl:655
  ^(::Integer, ::Bool)
   @ Base bool.jl:170
  ^(::T, ::T) where T<:Integer
   @ Base intfuncs.jl:310
  ...

Stacktrace:
 [1] top-level scope
   @ REPL[135]:1

julia> st.f(2,3)
8

I don’t get it.

I can have a quick look later to see if I can figure it out, but isn’t that the example in the docs?

https://juliastats.org/StatsModels.jl/stable/internals/#An-example-of-custom-syntax:-poly

yes that looks exactly like what I need. I think we should add the sentence to this example with the log function - I had no incentive to look any further, given that example.

this is going to work only in cases with non-ambigous dispatch, for example, the log example below. For functions with multiple arguments you will need to extent the formula syntax as explained there