Build a formula from a string

I want to build a StatsModels formula from a string. This is the closest I’ve gotten.

``````text = "y ~ x"
StatsModels.terms!(StatsModels.sort_terms!(StatsModels.parse!(Meta.parse(text))))
# :((\$(Expr(:escape, :~)))(Term(:y), Term(:x)))
``````
1 Like

Could you explain why it’s useful to build from a string?

Since StatsModels.jl shares some syntax with other languages, I can use the same formula file from multiple tools.

If your formulas are very basic (just multiple linear regression), you can just split your string and construct terms from that:

``````julia> using GLM

julia> f = "y ~ x1 + x2"
"y ~ x1 + x2"

julia> y, xs = split(f, "~")
2-element Vector{SubString{String}}:
"y "
" x1 + x2"

julia> term(y) ~ sum(term.(split(xs, "+")))
FormulaTerm
Response:
y (unknown)
Predictors:
x1 (unknown)
x2(unknown)
``````

Apart from that you’re basically looking to do what the `@formula` macro does I suppose, and that’s what you’ve got already: StatsModels.jl/formula.jl at master · JuliaStats/StatsModels.jl · GitHub

1 Like

@dave.f.kleinschmidt Is there a way I can turn this `text` into a `FormulaTerm`?

``````text = "y ~ x + z"
StatsModels.terms!(StatsModels.sort_terms!(StatsModels.parse!(Meta.parse(text))))
:((\$(Expr(:escape, :~)))(Term(:y), (\$(Expr(:escape, :+)))(Term(:x), Term(:z))))
``````

You can probably `eval` what you’ve got there, but is it absolutely necessary to be working from a string? In general, if you’re trying to do something with a string that involves calling some function depending on the contents of the string, you’re not going to be able to do it without `eval` somewhere, so you’re probably better off doing something like

``````@eval(@formula(\$(Meta.parse(text))))
``````

(much as I hate to say it )

What’s going on here is that `Meta.parse` is converting your string into a Julia `Expr`, the `\$(...)` is inserting that into the expression starting with `@formula`, and then `@eval` is evaluating the whole thing. It’s basically as if you’d typed `@formula y ~ x + z` into the REPL.

It’s generally a dangerous idea to use `@eval` in scripts since it can lead to performance gotchas unless you’re VERY careful, but in this case I don’t see a way around it. The usual advice we give to people trying to construct a formula on the fly is to wrap their term symbols in `Term`s and combine them with `+`, `&`, and `~`, but if you have to be able to handle ANY formula that’s valid in R that won’t work (short of writing your own parser basically).

3 Likes