Binomial distribution without Distributions.jl?


#1

Hi all,

This is not intended to be a knock on the excellent Distributions.jl package - we’ve used it happily for several years in LightGraphs and it’s working wonderfully.

We currently have an initiative to make LightGraphs lighter weight in terms of its dependencies, though, and we’re running into a bit of an issue. We use Distributions for exactly one thing: a random number generator using binomial distributions for one of our random graph generators. The problem is that this single function adds at least 12 new dependencies to our dependency tree, including some heavy hitters. Again, we’re trying to lighten up LightGraphs right now, and to have this number of dependencies for a single function seems like a bit of an overkill.

So - is there any other way to get a binomial random number generator that would allow us to achieve our weight-loss goal?


#2

The code for that in Distributions is exactly

rand(d::Binomial) = convert(Int, StatsFuns.RFunctions.binomrand(d.n, d.p))

I guess you could simply copy that? StatsFuns itself requires only Rmath and SpecialFunctions.


#3

Right. Rmath requires BinDeps and a few other things, though, and I’d really like to avoid BinDeps if at all possible since it introduces cross-platform fragility and has a lot of overhead. (Again, not knocking it, just trying to put the “Light” back in “LightGraphs”.)


#4

If you want to go really lightweight and you know that your sample sizes will be sufficiently large, you can use Base.randn, since the binomial distribution converges to a standard normal as n increases (see Central Limit Theorem).


#5

We’re using it exactly once per graph, in order to get a single value that represents the number of edges to generate (m = |V|^2, and I don’t know about p) so I’m not sure randn will work – but I will defer to stats experts on this.


#6

Check out https://github.com/JuliaStats/Distributions.jl/blob/master/src/samplers/binomial.jl. There are some samplers written in Julia. We don’t use them by default but I think they work.


#7

It’s weird that there isn’t a pure-Julia version of Distributions.jl - writing efficient samplers seems to be something Julia is uniquely suited for.


#8

Yeah, but people want to “get everything done” first.


#9

So, here’s what I wound up doing:

# taken from http://stackoverflow.com/questions/23561551/a-efficient-binomial-random-number-generator-code-in-java
function _getBinomial(n::Integer, p::Real, seed::Integer=-1)
    rng = getRNG(seed)
    log_q = log(1.0 - p)
    x = 0
    sum = 0.0
    while true
        sum += log(rand(rng)) / (n - x)
        sum < log_q && break
        x += 1
    end
    return x
end

Passes all tests so far.


#10

You probably need to have an attribution in your license for this. Make sure you add that.


#11

Just wanted to say that the fact that LightGraphs has so many dependencies was a main reason for me not to use it and instead roll my own, so I very much appreciate this direction.


#12

@tkoolen - thanks. Take a look at https://github.com/JuliaGraphs/LightGraphs.jl/issues/557#issuecomment-292806096 for the current status. We will be down to 4 direct dependencies, with 8 total. I think that’s as good as we’re going to get.