Formula creation without direct name specification



Say I have a DataFrame called mydata where names(mydata) is Symbol[5001] consisting of :y, :x1, :x2, :x3, :x4, … (the other 4995 symbols) … , :x5000.

I want to fit models with the formula ModelFrame(y ~ x1, mydata), ModelFrame(y ~ x2, mydata), etc. until ModelFrame(y ~ x5000, mydata). ModelFrame can be switched to glm or lmm as I believe they use the same formula.jl code.

How do I create a model such that I can do something like ModelFrame(y ~ names(mydata)[x], mydata) where x is any number? This would allow looping over variables. However, I’m not really sure what type of variable is being given after the tilde because ModelFrame(y ~ names(mydata)[2], mydata), ModelFrame(y ~ Symbol(:x1), mydata), or ModelFrame(y ~“x1”, mydata) don’t work.


You can use interpolation. Here some minimal examples using Symbols or Strings

julia> names = [:a,:b,:c]
3-element Array{Symbol,1}:

julia> y ~ $(names[2])
Formula: y ~ b

julia> names_str = ["a","b","c"]
3-element Array{String,1}:

julia> y ~ $(Symbol(names_str[2]))
Formula: y ~ b


A bit of trivia: When you open up Julia and don’t import any package you can see that the syntax still works but throws an error

julia> y ~ x
ERROR: UndefVarError: @~ not defined

So what is happening is that y ~ x + z is just a syntax sugar for the macro @~ y x + z

julia> :(y ~ x)
:(@~ y x)

julia> :(y ~ x + y)
:(@~ y x + y)

This is the reason why things were unintuitive for you, because the code you typed was not evaluated, but instead just passed as an expression to the macro implementation.