The @formula macro allows the construction of variables when creating a model matrix.
This is great! I don’t have to add various transformations of variables to a DataFrame simply because I want to try alternative regression specifications. I can write:
@formula(y/x ~ w)
or
@formula(y - z ~ w + w^2)
But what if I want to run the regression:
GLM.lm( @formula(ismissing(y) ~ x + z) , df)
Uh oh. It doesn’t matter that ismissing(y) is never missing, GLM recognizes that y itself has missing values and drops these rows from the model matrix.
Obviously, I could construct a new variable outside the formula macro:
df[!, :y_is_missing] = ismissing.(df.y)
and run my regression using this. But I don’t want to. Is there any way to turn off the feature that drops missings from the model matrix? A long-term solution would probably have GLM.jl check if the transformed variables are missing, rather than if the variables themselves are missing. In the interim…