Logistic regression for data with missing values

This is a methodological question, not really a Julia question. Forgetting about the complications of the logistic function for a second, regression tries to solve a system of equations A = bX by doing A\b - this of course only works if there are actual numbers in A and X, not missing values. Most statistical software will drop observations with missing values, here’s R for example:

> df <- data.frame(y = c(0, 1, 1, 0, 1, 0), x = c(0.5, 0.9, 0.8, NA, 0.82, 0.9))
> glm(y ~ x, family = binomial(link = 'logit'), data = df)

Call:  glm(formula = y ~ x, family = binomial(link = "logit"), data = df)

Coefficients:
(Intercept)            x  
     -5.262        7.227  

Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
  (1 observation deleted due to missingness)
Null Deviance:	    6.73 
Residual Deviance: 5.615 	AIC: 9.615

If you don’t want to drop observations with missing values you will have to either drop a covariate which introduces missingness, or impute.

3 Likes