Say I have a DataFrame called mydata where names(mydata) is Symbol[5001] consisting of :y, :x1, :x2, :x3, :x4, … (the other 4995 symbols) … , :x5000.
I want to fit models with the formula ModelFrame(y ~ x1, mydata), ModelFrame(y ~ x2, mydata), etc. until ModelFrame(y ~ x5000, mydata). ModelFrame can be switched to glm or lmm as I believe they use the same formula.jl code.
How do I create a model such that I can do something like ModelFrame(y ~ names(mydata), mydata) where x is any number? This would allow looping over variables. However, I’m not really sure what type of variable is being given after the tilde because ModelFrame(y ~ names(mydata)[2], mydata), ModelFrame(y ~ Symbol(:x1), mydata), or ModelFrame(y ~“x1”, mydata) don’t work.
You can use interpolation. Here some minimal examples using Symbols or Strings
julia> names = [:a,:b,:c]
3-element Array{Symbol,1}:
:a
:b
:c
julia> y ~ $(names[2])
Formula: y ~ b
julia> names_str = ["a","b","c"]
3-element Array{String,1}:
"a"
"b"
"c"
julia> y ~ $(Symbol(names_str[2]))
Formula: y ~ b
1 Like
A bit of trivia: When you open up Julia and don’t import any package you can see that the syntax still works but throws an error
julia> y ~ x
ERROR: UndefVarError: @~ not defined
So what is happening is that y ~ x + z
is just a syntax sugar for the macro @~ y x + z
julia> :(y ~ x)
:(@~ y x)
julia> :(y ~ x + y)
:(@~ y x + y)
This is the reason why things were unintuitive for you, because the code you typed was not evaluated, but instead just passed as an expression to the macro implementation.