I want to call the GLM.jl library as a blackbox, which means I want to use a formula of the form “y ~ .”, where y is the target column of my dataframe, and . is all the other features.
Specifically, what I would like to write is this:
function logreg_binary_fit_glm(X, y)
D, N = size(X)
Xt = permutedims(X)
df = DataFrame(Xt)
df[:y] = y
# regress on all inputs :x1 ... :xd
model = glm(@formula(y ~.), df, Binomial(), LogitLink())
return coef(model)
end
Unfortunately dot syntax is not supported by the latest StatsModels v0.5, and the solution proposed here does not work.
It is easy to generate the string “y ~ x1 + x2 + … xD” where D is the number of features (eg using the code below).
function make_formula_all_features(df, target)
col_symbols = Set(names(df))
feature_symbols = setdiff(col_symbols, Set([target]))
feature_strings = [string(s) for s in feature_symbols]
all_features = join(feature_strings, " + " )
formula = string(target) * " ~ " * all_features
return formula
end
However - I don’t know how to convert this to an expression to pass to the @formula macro, and applying Meta.parse to the output does not work:
n = 10
df = DataFrame(X1=randn(n), X2=randn(n), Y=randn(n))
ols = lm(@formula(Y ~ X1 + X2), df) # works
formula = make_formula_all_features(df, :Y) # "Y ~ X2 + X1"
f_expr = Meta.parse(formula) # :(Y ~ X2 + X1)
ols2 = lm(@formula(f_expr), df) # ERROR: type Symbol has no field head
Any ideas?