This is a methodological question, not really a Julia question. Forgetting about the complications of the logistic function for a second, regression tries to solve a system of equations A = bX
by doing A\b
- this of course only works if there are actual numbers in A
and X
, not missing values. Most statistical software will drop observations with missing values, here’s R for example:
> df <- data.frame(y = c(0, 1, 1, 0, 1, 0), x = c(0.5, 0.9, 0.8, NA, 0.82, 0.9))
> glm(y ~ x, family = binomial(link = 'logit'), data = df)
Call: glm(formula = y ~ x, family = binomial(link = "logit"), data = df)
Coefficients:
(Intercept) x
-5.262 7.227
Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
(1 observation deleted due to missingness)
Null Deviance: 6.73
Residual Deviance: 5.615 AIC: 9.615
If you don’t want to drop observations with missing values you will have to either drop a covariate which introduces missingness, or impute.