How to generate nonnegtive integer array with fixed row sums and column sums

ChingLinHuang · October 1, 2022, 12:38pm

I am trying to randomly generate arrays with given row sums and column sums. The elements in the array should be nonnegative integers. Originally, I wanted to enumerate all the combinations and later filter out those that have the wrong row sums and columns. However, when the array size, the row sums and column sums get larger, this approach seemed to be unrealistic. I found that there are some linear optimization tools that can generate one solution under the set constraint, e.g. JuMP. I can add the constraints iteratively, and then I can generate multiple arrays with the given row sums and column sums. But what I am concerned about is that I don’t know whether the “solutions” of the array are randomly generated. In my study, I want to do the Monte Carlo permutation test under these constraints, so I hope to generate these arrays “uniformly”. I am here asking whether there are some packages or alternative approaches to do this special array shuffling. Thank you.

dlakelan · October 1, 2022, 1:20pm

You will need to have a probabilistic statement about what you mean by uniformly for example a 2x2 matrix with row and column sums all equal to 1 could have

[2 -1;
-1 2]

or

[999999999999999999999 -999999999999999999998;
...]

In other words the entry in the upper left place could be ANYTHING and there is no such probability distribution as “uniform on the real numbers”

Dan · October 1, 2022, 1:33pm

The post says nonnegative integers

dlakelan · October 1, 2022, 1:45pm

Ah, I missed that!

Still, it seems to me the distribution in question is rather vague, OP doesn’t say whether all rows and columns have the same sum, only that these sums are given. It’s not clear to me for any given 2N numbers if it’s always possible to generate a matrix with those row and column sums… For example row sums 1,1 and column sums 1 and 10^6

Basically what’s needed is more mathematical precision about the distribution in question.

ChingLinHuang · October 1, 2022, 1:49pm

Hi dlakelan. Yes, if the elements can be any integer, the problem will be easy. We may simply generate the elements by wsample(1:nrow, uniform prob, colsum) except for one column, and then the difference between row sum constraint and the row sum of the generated array would be the last column. By this approach, we can easily generate the array satisfying the row sums and column sums constraint.
However, this turns out to be not so easy when I need all the elements to be nonnegative.

Dan · October 1, 2022, 1:51pm

Yes. Also note, that this is a discrete distribution. It would be good to know the size of the matrix, as for 2x2 matrices there are good formulas. The problem becomes trickier for larger matrices.

ChingLinHuang · October 1, 2022, 1:57pm

The uniform distribution I mean is:
If there are N nonnegative integer arrays with the row sums V and column sums W. The probabilities to generate one of them are the same, i.e. \frac{1}{N}.

ChingLinHuang · October 1, 2022, 1:59pm

Hi Dan. The maximal size of the array is 100 x 50 and the maximal row sums and column sums are about 5000.

dlakelan · October 1, 2022, 2:04pm

Without some specialty mathematical results I think your problem will be very hard. My first inclination would be to MCMC sample and down weight based on the total absolute error in the row and column sums… Then throw out all those that have error after running for a long time. My guess is this will result in zero samples very often though.

Basically the pmf would be exp(-sum(abs(err_i))) where err ranges over all the column and row errors. You will need to also come up with a proposal distribution, which might be as simple as choose a random entry and add +1 or -1 with reflection at 0

odow · October 10, 2022, 11:21pm

I found that there are some linear optimization tools that can generate one solution under the set constraint, e.g. JuMP. I can add the constraints iteratively, and then I can generate multiple arrays with the given row sums and column sums. But what I am concerned about is that I don’t know whether the “solutions” of the array are randomly generated.

I came across this because I searched “JuMP.” Here’s what I would do. The trick is to use a random objective function. I think you should be able to show that the random objective function will let you sample all possible solutions, but I don’t have a proof that the distribution will be uniform.

using JuMP
import HiGHS

function main(row_sums::Vector{Int}, col_sums::Vector{Int})
    m, n = length(row_sums), length(col_sums)
    model = Model(HiGHS.Optimizer)
    set_silent(model)
    @variable(model, x[1:m, 1:n] >= 0, Int)
    @constraint(model, [i=1:m], sum(x[i, :]) == row_sums[i])
    @constraint(model, [j=1:n], sum(x[:, j]) == col_sums[j])
    c = rand(m, n)
    @objective(model, Min, sum(c .* x))
    optimize!(model)
    if termination_status(model) != OPTIMAL
        return nothing
    end
    return value.(x)
end

row_sums = col_sums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
main(row_sums, col_sums)

Topic		Replies	Views
Randomly generating an array that satisfies a condition New to Julia	13	421	May 10, 2023
Creating an array/matrix with a specified range of randomly generated numbers New to Julia	5	21170	September 25, 2018
Static array with uniform distribution Statistics distributions , staticarrays , random	2	330	November 30, 2023
Static Array With Random Numbers From a Uniform Distribution New to Julia statistics , staticarrays	9	719	December 13, 2023
How to create an array of specific length populated by random uniform values within a range New to Julia	9	2946	March 27, 2019

How to generate nonnegtive integer array with fixed row sums and column sums

Related topics