How to "prioritize" choosing some parameters over others in Hyperopt.jl or similar packages

Hi there, I wasn’t sure whether I should have posted this in Statistics or in Optimization, but here it goes.

I have a peculiar situation. I want to use Hyperopt.jl optimize a function quality(x::Vector{Int}). The function inputs a vector of integers that are indices for the columns of a large matrix. The function uses these columns and outputs a real number q. I want to find the maximum q possible given various x. However, I want to keep length(x) within reasonable bounds, probably at most up to 4. Said differently, the problem I am trying to optimize is: which columns of the matrix when combined give the best “quality”. (for context, here quality is the ability to separate the data into as many clusters as possible)

Here’s my difficulty. My quality function takes into account the length(x) and uses it as a weight: the higher the length(x) the more the penalty to the quality. Let’s say I have an input matrix of 20 columns or so. Then, all possible input x vectors derived from this matrix are:

julia> [binomial(20, i) for i in 1:4]
4-element Vector{Int64}:

julia> sum(ans)

Initially I thought of collecting all these possible combinations into one very long 6195-element vector of Vector{Int}. But, as is well known, the number of elements in x put massively more weight into combinations of x with length (4) due to the scaling of the binomial coefficient.

Is there a way to perform such an optimization loop by putting more weight into smaller xs? My quality function itself puts more weight into smaller xs because it returns a smaller output for smaller x. But my fear is that the small x may not be sampled at all by the optimization algorithm because they are so few in comparison to the larger x.

I would appreciate advice on how to tackle this either with Hyperopt.jl or some other optimization package. I could also write my own brute force loop that picks at random at most e.g. 100 possible xs from each combinatorial combination of lengths 1, 2, 3 or 4, but I have already the code I could immediately wrap in the @hyperopt macro.

cc @baggepinnen as the author of Hyperopt.jl maybe you have encountered already a scenario where different input parameters need to be weighted differently.

(to be clear, down the line I will be adding more hyper parameters to be optimized which are categorical or reals; but my central concern is the discussion above with x)

You are solving a mixed integer problem, and could make use of a dedicated MIP solver rather than Hyperopt.jl. Model the decision variable as a Boolean vector of length equal to the number of columns, and add a penalty for each non-zero entry.

Thank you very much. I didn’t know of this term before and I’ve now started reading some resources on mixed integer problems.

The landscape of optimization packages in Julia seems to be vast; do you have perhaps a suggestion of which package I can get started with as a “dedicated MIP solver”?

I’ve found this in Jump: Sudoku · JuMP perhaps that’s a good place to start?

Edit 1: Hm, I am not sure, because it is not clear how to “penaltize” having more 1 entries over 0 (i.e., penaltize using more columns of my matrix). Edit 2: okay, I’ll get started by porting some part of the Sudoku code and mixing it with a minimization problem defined in Jump that has the objective to minimize the number of 1s in the boolean vector!

Jump is a good choice. The sum of the index vector of Boolean variables is a good penalty, it eqals the count of nonzero entries