Cdf for multinomial

Frey described an alternative and included R code. Lebrun described another alternative that is more efficient though the code is only available by request. From Lebrun

It is only recently that reasonably efficient algorithms have been described (see Corrado 2011; Frey 2009), using completely different roots than the previous authors. But even with these algorithms, the only distribution for which it is possible to compute arbitrary rectangular probabilities is the multinomial one, with a polynomial space and time complexity.

We propose to change this situation by providing an algorithm which is essentially exact up to machine precision for all the multivariate discrete distributions considered in Butler and Stutton (1998) and Levin (1983), amongst which the multinomial, multivariate hypergeometric and multivariate PĆ³lya distributions. In the multinomial case, our algorithm is more efficient than the algorithm described in Frey (2009), both with respect to space and time complexity, for an equivalent or even better accuracy when implemented in double precision.

The derivations are beyond my quick understanding, but the code in Frey can very easily be rewritten in Julia without any special functions beyond log-gamma (fairly direct translation below).

using SpecialFunctions

function multinom_cdf(lb, ub, nobs, probs)
    k = length(probs)
    sum_lb = sum(lb)
    sum_ub = sum(ub)

    logc = loggamma(nobs + 1) + sum(lb .* log.(probs)) - sum(loggamma.(lb .+ 1))
    r = nobs - sum_lb
    exc = sum_ub - sum_lb
    fac = zeros(exc + 1)
    sind = zeros(exc + 1)
    loc = 1
    fac[1] = 0

    for i in 1:k
        if ub[i] > lb[i]
            mi = ub[i] - lb[i]
            for j in 1:mi
                loc += 1
                fac[loc] = probs[i] / (lb[i] + j)

                if j == 1
                    sind[loc] = 1
                end
            end
        end
    end

    cprob = zeros(exc + 1)
    cprob[1] = 1.0

    for t in 1:r
        tbelow = [0;cumsum(cprob[1:exc])]
        fac2 = tbelow .* sind + [0;cprob[1:exc]] .* (1.0 .- sind)
        cprob = fac .* fac2
        m = maximum(cprob)
        logc += log(m)
        cprob = cprob ./ m
    end

    exp(logc + log(sum(cprob)))
end

@time multinom_cdf([0,0,0,0], [4,4,4,4], 10, [0.25,0.25,0.25,0.25])
@time multinom_cdf([0,0,0,0], [20,20,20,20], 60, [0.25,0.25,0.25,0.25])
@time multinom_cdf([0,0,0,0], [200,200,200,200], 600, [0.25,0.25,0.25,0.25])

Output

  0.000030 seconds (108 allocations: 19.781 KiB)
0.6889343261718746

  0.000173 seconds (608 allocations: 433.875 KiB)
0.7849628688153565

  0.048421 seconds (6.01 k allocations: 37.629 MiB, 30.49% gc time)
0.9999922199658132

For comparison

function MultinomialCDF(D, c)
    k = length(c)
    tmp = zeros(Int, k)
    sum(s->pdf(D, foldl((r,x)->(r[x]+=1; r), s; init=((tmp .= 0;tmp)))),
      multiset_combinations(1:k,c,D.n))
end

D1 = Multinomial(10, [0.25,0.25,0.25,0.25])
D2 = Multinomial(60, [0.25,0.25,0.25,0.25])
D3 = Multinomial(600, [0.25,0.25,0.25,0.25])

@time MultinomialCDF(D1, [4,4,4,4])
@time MultinomialCDF(D2, [20,20,20,20])
@time MultinomialCDF(D3, [200,200,200,200])

Output

  0.000260 seconds (1.44 k allocations: 67.672 KiB)
0.6889343261718759

  0.038326 seconds (125.75 k allocations: 6.001 MiB, 31.74% gc time)
0.7849628688153811

187.492780 seconds (1.28 G allocations: 46.206 GiB, 4.31% gc time)
0.999992219962507
4 Likes