Logistic Regression Problem

I’m trying to use a Logistic Regression algorithm to find a classification model, but i get stuck with an error “failure to converge after 30 iterations” i have changed the maxIter arg to an higher value, but the error only disappears with a very high value of iterations, in the small example below with a population of 1000 elements and with a very simple implicit model i need 2000 iterations! Am i doing something wrong?

Blockquote
df=DataFrame()
n=1000
df[:x]=rand(n)
df[:y]=rand(n)
df[:z]=rand(n)
df[:valid]=map((x,y)->(x2-y6)>0? true : false,df[:x],df[:y])
glm(@formula(valid ~ x+y+z), df, Binomial(), LogitLink())

Since there is no noise added to the linear predictor, it predicts the outcomes perfectly and, as a consequence, the MLE doesn’t exist, i.e. the likelihood function doesn’t have an optimum but keeps growing as one or more of the model parameters diverge. See e.g. FAQ What is complete or quasi-complete separation in logistic/probit regression and how do we deal with them?. In ML, people usually add some regularization to ensure that an optimum exists but GLM doesn’t add regularization.

2 Likes

Thks, that was a perfect answer!

Is it planned functionality to add some simple normalization to GLM? If, as I’m assuming, the fitting of a GLM is essentially Newton-Raphson algorithm, it should be easy to add some optional L2 cost parameter. It could be a big plus for usability, otherwise it can be confusing for a new user that, as soon as your problem is a bit ill-defined/unstable, you need to switch to I guess GLMnet or Lasso which is a different package with different syntax etc.