How to write an objective function that has both a LASSO and a Ridge regularization term in JuMP

Please ignore the previous message. This is not least squares, but logistic regression, so we would not likely be able to recover the original parameters vectors as it is…

Interesting; there’s two things to note here; 1. The amplitude of all the parameters you recovered is the same which matches your setup. 2. The intercept shows that you did something wrong.

there’s one thing I missed in your setup though that you have to fix, binary classification in MLJLM expects the labels to be -1 and 1 not 0/1 this is for performance reasons; if you do want something agnostic to labels, again you can call it via MLJ which handles this. (This is indicated in the readme but granted the docs are a weak spot)

Can you try setting y[y==0].=-1 refit and check?

after setting y’s to be -1.0 wherever it was 0.0, this is what I am getting

8-element Array{Float64,1}:
27.42314982983235
27.63592742919614
27.69635130347215
27.275059324023076
27.52210618657443
27.46357714305869
27.558931504148596
0.1147927751139273

Using this vector for the parameters to make predictions, the square of the sum of errors, norm(y_true - y_predicted)^2 is just 8, i.e. only 8 were mispredicted.

1 Like

Yes so here you’re predicting a hyperplane so the scaling doesn’t matter, you can see that the intercept is roughly zero and that everything else is on the same scale :slight_smile:

Will the same setting work as it is if y was a vector of real numbers instead of {-1,1}?

For a non logistic regression yes sure :slight_smile:

Ps: just a small note, you might want to look up what adding an l1 penalty does, in your example earlier your vector was completely not sparse and so an l1 penalty is pretty useless. Maybe this is clear to you already.

Yeah it penalizes the parameters so the extraneous ones go to zero.

Here is the thing: I am experimenting with reducing the dimensions of the matrices X (and y correspondingly) by projecting X onto a lower dimensional space. That mean premultiplying X with a scaling matrix M (and y too). That would mean My wouldn’t necessarily have -1 and +1 only. Can MLJLM be used then to optimize the logistic objective function (the one I am trying to minimize) with MX and My?

What you say doesn’t make sense. Reducing the dimensions of X, fine, reducing that of y not.

If you transform X and get X', with fewer columns; you’re back in the same box and can do logistic regression on that and y. For instance you could do PCA on the initial X and then run a logistic regression on the result.

What you’re talking about with MX and My doesn’t make sense; in the context of a linear regression, you could be doing something like this where M would be a pre-conditioner, but I’m pretty sure you’re confusing things (preconditioning is for linear algebra perspectives, not really statistical learning); anyway think about the logistic equations:

y \approx \sigma (X\beta + \gamma)

left multiply this by M what happens?

yeah there are issues with the formulation. What I am trying to do is sketch the matrix to reduce its dimensions. That could be done by various ways. One of them is to select some rows of X based on some probability distribution, and then select the corresponding row (entry) or y too. Another quick and dirty way is to premultiply X by a matrix M, and do that to y too to get MX and My.

I am getting a warning: “No appropriate stepsize found via backtracking; interrupting”. Is there a way to default to some small step size in this situation, rather than interrupting?