Build a formula from a string

I want to build a StatsModels formula from a string. This is the closest I’ve gotten.

text = "y ~ x"
StatsModels.terms!(StatsModels.sort_terms!(StatsModels.parse!(Meta.parse(text))))
# :(($(Expr(:escape, :~)))(Term(:y), Term(:x)))

Could you explain why it’s useful to build from a string?

Since StatsModels.jl shares some syntax with other languages, I can use the same formula file from multiple tools.

If your formulas are very basic (just multiple linear regression), you can just split your string and construct terms from that:

julia> using GLM

julia> f = "y ~ x1 + x2"
"y ~ x1 + x2"

julia> y, xs = split(f, "~")
2-element Vector{SubString{String}}:
 "y "
 " x1 + x2"

julia> term(y) ~ sum(term.(split(xs, "+")))
FormulaTerm
Response:
  y (unknown)
Predictors:
   x1 (unknown)
   x2(unknown)

Apart from that you’re basically looking to do what the @formula macro does I suppose, and that’s what you’ve got already: StatsModels.jl/formula.jl at master · JuliaStats/StatsModels.jl · GitHub

1 Like

@dave.f.kleinschmidt Is there a way I can turn this text into a FormulaTerm?

text = "y ~ x + z"
StatsModels.terms!(StatsModels.sort_terms!(StatsModels.parse!(Meta.parse(text))))
:(($(Expr(:escape, :~)))(Term(:y), ($(Expr(:escape, :+)))(Term(:x), Term(:z))))

You can probably eval what you’ve got there, but is it absolutely necessary to be working from a string? In general, if you’re trying to do something with a string that involves calling some function depending on the contents of the string, you’re not going to be able to do it without eval somewhere, so you’re probably better off doing something like

@eval(@formula($(Meta.parse(text))))

(much as I hate to say it :slight_smile: )

What’s going on here is that Meta.parse is converting your string into a Julia Expr, the $(...) is inserting that into the expression starting with @formula, and then @eval is evaluating the whole thing. It’s basically as if you’d typed @formula y ~ x + z into the REPL.

It’s generally a dangerous idea to use @eval in scripts since it can lead to performance gotchas unless you’re VERY careful, but in this case I don’t see a way around it. The usual advice we give to people trying to construct a formula on the fly is to wrap their term symbols in Terms and combine them with +, &, and ~, but if you have to be able to handle ANY formula that’s valid in R that won’t work (short of writing your own parser basically).

3 Likes