How to generate nonnegtive integer array with fixed row sums and column sums

I am trying to randomly generate arrays with given row sums and column sums. The elements in the array should be nonnegative integers. Originally, I wanted to enumerate all the combinations and later filter out those that have the wrong row sums and columns. However, when the array size, the row sums and column sums get larger, this approach seemed to be unrealistic. I found that there are some linear optimization tools that can generate one solution under the set constraint, e.g. JuMP. I can add the constraints iteratively, and then I can generate multiple arrays with the given row sums and column sums. But what I am concerned about is that I don’t know whether the “solutions” of the array are randomly generated. In my study, I want to do the Monte Carlo permutation test under these constraints, so I hope to generate these arrays “uniformly”. I am here asking whether there are some packages or alternative approaches to do this special array shuffling. Thank you.

1 Like

You will need to have a probabilistic statement about what you mean by uniformly for example a 2x2 matrix with row and column sums all equal to 1 could have

[2 -1;
-1 2]

or

[999999999999999999999 -999999999999999999998;
...]

In other words the entry in the upper left place could be ANYTHING and there is no such probability distribution as “uniform on the real numbers”

The post says nonnegative integers

Ah, I missed that!

Still, it seems to me the distribution in question is rather vague, OP doesn’t say whether all rows and columns have the same sum, only that these sums are given. It’s not clear to me for any given 2N numbers if it’s always possible to generate a matrix with those row and column sums… For example row sums 1,1 and column sums 1 and 10^6

Basically what’s needed is more mathematical precision about the distribution in question.

Hi dlakelan. Yes, if the elements can be any integer, the problem will be easy. We may simply generate the elements by wsample(1:nrow, uniform prob, colsum) except for one column, and then the difference between row sum constraint and the row sum of the generated array would be the last column. By this approach, we can easily generate the array satisfying the row sums and column sums constraint.
However, this turns out to be not so easy when I need all the elements to be nonnegative.

Yes. Also note, that this is a discrete distribution. It would be good to know the size of the matrix, as for 2x2 matrices there are good formulas. The problem becomes trickier for larger matrices.

The uniform distribution I mean is:
If there are N nonnegative integer arrays with the row sums V and column sums W. The probabilities to generate one of them are the same, i.e. \frac{1}{N}.

Hi Dan. The maximal size of the array is 100 x 50 and the maximal row sums and column sums are about 5000.

Without some specialty mathematical results I think your problem will be very hard. My first inclination would be to MCMC sample and down weight based on the total absolute error in the row and column sums… Then throw out all those that have error after running for a long time. My guess is this will result in zero samples very often though.

Basically the pmf would be exp(-sum(abs(err_i))) where err ranges over all the column and row errors. You will need to also come up with a proposal distribution, which might be as simple as choose a random entry and add +1 or -1 with reflection at 0

I found that there are some linear optimization tools that can generate one solution under the set constraint, e.g. JuMP. I can add the constraints iteratively, and then I can generate multiple arrays with the given row sums and column sums. But what I am concerned about is that I don’t know whether the “solutions” of the array are randomly generated.

I came across this because I searched “JuMP.” Here’s what I would do. The trick is to use a random objective function. I think you should be able to show that the random objective function will let you sample all possible solutions, but I don’t have a proof that the distribution will be uniform.

using JuMP
import HiGHS

function main(row_sums::Vector{Int}, col_sums::Vector{Int})
    m, n = length(row_sums), length(col_sums)
    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x[1:m, 1:n] >= 0, Int)
    @constraint(model, [i=1:m], sum(x[i, :]) == row_sums[i])
    @constraint(model, [j=1:n], sum(x[:, j]) == col_sums[j])
    c = rand(m, n)
    @objective(model, Min, sum(c .* x))
    optimize!(model)
    if termination_status(model) != OPTIMAL
        return nothing
    end
    return value.(x)
end

row_sums = col_sums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
main(row_sums, col_sums)