I want to call the GLM.jl library as a blackbox, which means I want to use a formula of the form “y ~ .”, where y is the target column of my dataframe, and . is all the other features.
Specifically, what I would like to write is this:
function logreg_binary_fit_glm(X, y) D, N = size(X) Xt = permutedims(X) df = DataFrame(Xt) df[:y] = y # regress on all inputs :x1 ... :xd model = glm(@formula(y ~.), df, Binomial(), LogitLink()) return coef(model) end
Unfortunately dot syntax is not supported by the latest StatsModels v0.5, and the solution proposed here does not work.
It is easy to generate the string “y ~ x1 + x2 + … xD” where D is the number of features (eg using the code below).
function make_formula_all_features(df, target) col_symbols = Set(names(df)) feature_symbols = setdiff(col_symbols, Set([target])) feature_strings = [string(s) for s in feature_symbols] all_features = join(feature_strings, " + " ) formula = string(target) * " ~ " * all_features return formula end
However - I don’t know how to convert this to an expression to pass to the @formula macro, and applying Meta.parse to the output does not work:
n = 10 df = DataFrame(X1=randn(n), X2=randn(n), Y=randn(n)) ols = lm(@formula(Y ~ X1 + X2), df) # works formula = make_formula_all_features(df, :Y) # "Y ~ X2 + X1" f_expr = Meta.parse(formula) # :(Y ~ X2 + X1) ols2 = lm(@formula(f_expr), df) # ERROR: type Symbol has no field head