Missing or NaN Data in GLM (e.g., in DataFrame, @formula)

it’s considerably more painful for new users. Fortunately, in the dataframe glm context, completecases can fix part of the problem. Unfortunately, it still leavies the “other” missing value NaN to be dealt with. (and completecases does not work for matrices.)

even then, every time a new model is run, it requires creating a new data frame, because the variable responsible for the missing content in one variable may or may not be in the model. I presume one needs to write a function that takes an AbstractDataFrame, copies it, and returns another sanitized DataFrame. Otherwise, I do not see how one can use the formula interface.

of course, it can be handled, but it is a roadblock compared to other stats packages I know of.

If the GLM calculations are aware of missing observations, this becomes much more convenient.

1 Like