How to find the number of non-zeros weights needed to minimise the distance between a weighted sum and a population

Hello, I need to use some individual elements (x_n, in my application agricultural farms of which I know every details) as building blocks to reconstruct an observed aggregate V_d >=0 for the d dimensions (in my application the agricultural system as we know from census data… hectars of lands, number of farms, cows and so on).

I want to find the minimum number of K non-zero weights so that my distance between the reconstructed aggregated and the observed one is below a threshold.

In mathematical terms the problem can be expressed as:

min_{x_n} K
s.t.
x_n \geq 0
\frac{\sqrt{\sum_d((\sum_n x_n * a_{n,d}) - V_d)^2}}{\sum_d V_d} \leq \epsilon
\sum_n \mathbf{1}(X_n > 0) = K

where the last line in an indicator function.

I can simplify the problem by just minimising the distance given K and then simply check which K is within my tolerance:

min_{x_n} \sum_d((\sum_n x_n * a_{n,d}) - V_d)
s.t.
x_n \geq 0
\sum_n \mathbf{1}(X_n > 0) \leq K

When I try to implement it in Jump however I have a problem with the Hessian (AssertionError: length(hess_I) > 0) in both cases:

using DelimitedFiles

coeff = readdlm(download("https://lef-nancy.org/files/index.php/s/gkxxYxqWcGCYAEM/download"),',') # Few KB
POP   = readdlm(download("https://lef-nancy.org/files/index.php/s/4dwiDtDfyoC5mbH/download"),',') # Few Bytes

using JuMP, Ipopt

(N,D) = size(coeff)
ϵ = 0.1

# Rescaling POP to 1 on all dimensions...
scCoeff = 1 ./ POP
POP = ones(D)
coeff = coeff .* scCoeff'
sumPOP = sum(POP[d] for d in 1:D)

upscaling = Model(optimizer_with_attributes(Ipopt.Optimizer)) # would require "using Ipopt" instead of GLPK
@variables upscaling begin
    x[1:N] >= 0;
    K >= 0
end

set_start_value.(x, 1)
set_start_value.(K, 100)
@NLobjective upscaling Min begin
    K
end
@NLconstraints upscaling begin
        sqrt(sum( (sum(coeff[n,d]*x[n] for n in 1:N) - POP[d])^2 for d in 1:D))/sumPOP <= ϵ
end
@NLconstraints upscaling begin
        sum( (x[n] > 0.0) for n in 1:N)  ==  K
end
optimize!(upscaling)

upscaling2 = Model(optimizer_with_attributes(Ipopt.Optimizer)) # would require "using Ipopt" instead of GLPK
@variables upscaling2 begin
    x[1:N] >= 0;
end
set_start_value.(x, 1)
@objective upscaling2 Min begin
    sum( (sum(coeff[n,d]*x[n] for n in 1:N) - POP[d])^2 for d in 1:D)
end
@NLconstraints upscaling2 begin
        sum( (x[n] > 0.0) for n in 1:N)  <=  5
end
optimize!(upscaling2)

Note that if I remove the constraint on K the model find a (local) minima without problems…

Is there a more specific way to solve these kind of problems ?

I have the impression that your problem can be thought as a case of robust fitting, with outlier elimination. Take a look at this, for example: