Optimize 0-1 loss in MIP

I am currently trying to solve the following optimization problem

Where N is the number of observations on \mathbf{x} predictors, and y \in \{-1, 1\} is a binary vector of length N, \lambda has to be integer, C_0, C_1 are tuning parameters penalizing complexity of the model. Illustrative example: suppose you would like to predict if a mushroom is poisonous (y) and have predictors like its odor and it’s color (x_1, x_2). I would like to obtain a solution similar to this:

However, I am new to optimization problems (I have a statistical background) and thought I’d be a nice opportunity to get to know Julia better.
Now I am a bit lost - I understood how to generally formulate and solve MIP problems with JuMP. However, I don’t know how to optimize the 0-1 loss between a predicted category and the actual category. Also, I don’t know how to formulate the first penalty term C_0 ||\lambda||_0 where ||\lambda||_0 is one if lambda is nonzero, and 0 if \lambda is zero.

Any help, and also resources about such types of problems, would be greatly appreciated.

Pictures from
Ustun, B., Traca, S., & Rudin, C. (2013). Supersparse linear integer models for interpretable classification. arXiv preprint arXiv:1306.6677 .

I believe the paper you mention actually provides a full MIP formulation.

In any case, to represent the “0-norm”, you could introduce a binary variable a_j \in \{0, 1\} for every component j of \lambda. Then you can minimize the C_0\sum_j a_j.
You also need to add constraints of the form \lambda_j \ge M a_j where M is a suitable upper bound on the values that \lambda can take.

@Maximilian did you find a solution to representing 0-1 Loss in JuMP? When I try this:

loss01 = @expression(model, (y .* (X * lambda)) .<= 0.0)

I get a MethodError:

ERROR: MethodError: no method matching isless(::GenericAffExpr{Float64,VariableRef}, ::Float64)

Any tricks to handle this?

Use @constraint instead of @expression. Take a read of the JuMP documentation under “constraints” and “expressions.”

Thanks, but this is not a constraint. It is a part of a larger expression that is to be in the objective. So not clear one can use contraint instead.

But in that case, you can’t use <=. The @expression should probably only cover the left-hand side of the inequality?