Draw two distinct random integers from `1:n`

carstenbauer · November 30, 2021, 11:46am

I’m looking for a fast and elegant way to repeatedly draw two distinct random integers from the range 1:n.

A naive baseline:

julia> function draw(n, iter)                                                                                                                                                                                                         
           for i in 1:iter                                                                                                                                                                                                            
               a, b = rand(1:n), rand(1:n)                                                                                                                                                                                            
               while b == a                                                                                                                                                                                                           
                   b = rand(1:n)                                                                                                                                                                                                      
               end                                                                                                                                                                                                                    
           end                                                                                                                                                                                                                        
           return nothing
       end
draw (generic function with 1 method)

julia> @btime draw(10_000, 1000);
  6.857 μs (0 allocations: 0 bytes)

Preferably, I’d want a solution that doesn’t use external packages (since eventually I want to use this in a package and don’t want to add extra dependencies). But depending on elegance and speed I might reconsider

Let the games begin (and thanks in advance!)

stabbles · November 30, 2021, 11:57am

With n = 10_000 you hit that branch almost never, so this is probably optimal.

lmiq · November 30, 2021, 11:59am

If n is large, thus the probability of coincidence is very small, probably one cannot improve too much what you have. For small n I came up with this (though I’m not sure if the distributions don’t become correlated somehow):

julia> function draw2(n)
           a = rand(1:n)
           b = rand(1:n-1)
           if a >= b
               b += 1
           end
           a, b
       end

(thinking a bit further, I think this option is fine, the second draw is a draw on the remaining possible “positions”, which are univocally assigned to the corresponding values, thus that is probably as correlated as the first option, unless I’m missing some subtlety - as shown bellow, clearly I am )

stabbles · November 30, 2021, 12:06pm

julia> using UnicodePlots

julia> function draw2(n)
                  a = rand(1:n)
                  b = rand(1:n-1)
                  if a >= b
                      b += 1
                  end
                  a, b
              end
draw2 (generic function with 1 method)

julia> histogram(reinterpret(Int,[draw2(3) for i = 1:100000]), nbins=3)
              ┌                                        ┐ 
   [1.0, 2.0) ┤███████████▏ 33329                        
   [2.0, 3.0) ┤█████████████████████████████████  99623  
   [3.0, 4.0) ┤██████████████████████▎ 67048             
              └                                        ┘ 
                               Frequency

Sukera · November 30, 2021, 12:11pm

You’re drawing two numbers in 1:n, so basically drawing two coordinates for a square with side n. There are n*n possible combinations. You exclude pairs with equal numbers, so you’re basically excluding the diagonal. We’re in integers, there are n values on the diagonal of a total of n^2. So we have n² - n valid values with equal distribution. As such, the chance for actually hitting the diagonal is n/n² = 1/n. We can check this - for n=2, we exclude 2 elements on the diagonal out of a total n² = 4. For n=3, we exclude 3 of 9, meaning our chance was 1/3. n=4 → 4/16 → 1/4 etc. This is the same chance as we get for having to draw any of the two a second time (since we have to draw at least one more time only if we hit the diagonal). As n grows, this quickly becomes very small. I’d be very surprised if any method would be faster than just picking again (though of course you can optimize this for e.g. n=2 directly by returning the only other option left) without acidentally biasing/correlating the distribution of the second number to with the distribution of the first more than necessary (i.e., more than only having n-1 possible values with equal chance).

You’re biasing b to be larger when a is larger than b, meaning the distribution of b is correlated to the distribution of a.

DNF · November 30, 2021, 12:25pm

Almost

function draw2_(n)
    a = rand(1:n)
    b = rand(1:n-1)
    b += (b >= a)
    return a, b
end

1.7.0-rc3> histogram(reinterpret(Int,[draw2_(3) for i = 1:100000]), nbins=3)   
              ┌                                        ┐ 
   [1.0, 2.0) ┤█████████████████████████████████  66720 
   [2.0, 3.0) ┤████████████████████████████████▉ 66607  
   [3.0, 4.0) ┤████████████████████████████████▉ 66673  
              └                                        ┘
                               Frequency

carstenbauer · November 30, 2021, 12:27pm

Gotta love those UnicodePlots.jl-powered histograms

Sukera · November 30, 2021, 12:30pm

In case anyone doesn’t catch the subtlety: the code by @DNF does b += (b >= a), while the code by @lmiq does b += (a >= b). The former correctly shifts only the distribution of values of b greater than a upwards by one, avoiding a, while the latter incorrectly shifts all values lower than a, meaning another value one less than b can potentially hit a again.

DNF · November 30, 2021, 12:31pm

It’s probably also useful to use explicit rng:

using Random
function draw2_(n, rng=MersenneTwister())
    a = rand(rng, 1:n)
    b = rand(rng, 1:n-1)
    b += (b >= a)
    return a, b
end

1.7.0-rc3> rng = MersenneTwister();

1.7.0-rc3> @btime draw2_(10, $rng)
  9.400 ns (0 allocations: 0 bytes)
(5, 10)

Edit: Oops, pasted wrong function definition.

Sukera · November 30, 2021, 12:33pm

I much prefer Xoshiro

julia> @benchmark draw2(10_000, rng) setup=(rng=Xoshiro())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  7.970 ns … 33.074 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.002 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.575 ns ±  1.485 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █  ▃ ▁▃  ▂▂  ▄   ▁   ▆▃                                    ▁
  █▅▃████▅▄██▅▆█▄▄▄█▅▄▄██▆▄▄▆▆▇▆▆▁▃▇▆▅▆▅▅▄▃▄▄▃▁▁▅▅▃▆▄▅▄▄▄▄▃▅ █
  7.97 ns      Histogram: log(frequency) by time     12.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark draw2(10_000, rng) setup=(rng=MersenneTwister())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  11.104 ns … 49.395 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.607 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.770 ns ±  2.931 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▇▅▄▃▂▃▂▂▄▃▂▂▁▂▂▃▂▂▃▁▁▂▂▂▁▁ ▁                              ▂
  ███████████████████████████████▇█▇▅▅▅▅▅▄▄▃▃▃▄▁▁▃▁▄▅▆▆▆▅▆▆▇▇ █
  11.1 ns      Histogram: log(frequency) by time      26.1 ns <

 Memory estimate: 17 bytes, allocs estimate: 0.

lmiq · November 30, 2021, 12:33pm

Brilliant and very cool. My interpretation was correct, the implementation was wrong

DNF · November 30, 2021, 12:35pm

Oh, yes, I knew there was a better one, but forgot which it was.

Sukera · November 30, 2021, 12:36pm

Xoshiro will be the default in the (very soon) to be released 1.7

lmiq · November 30, 2021, 12:37pm

Still, from this matricial picture, it should be possible to call rand(1:n^2-n) and do an index transformation to obtain the two numbers. I don’t know if in any case the cost of those transformations pay off relative to a call to rand().

Sukera · November 30, 2021, 12:46pm

You’re more or less asking about converting a linear index to a cartesian one - and if I remember correctly, that involves a division + modulo, which is surely more expensive than a single rand and an addition. Additionally, simply excluding n values from the range is not sufficient - it matters which values you exclude.

lmiq · November 30, 2021, 12:54pm

Yes, that. It is not completely evident that those index transformations are more expensive than a call to rand() though. (unfortunately to get the indexes right I would spend all day…). But a simple guess suggests that it might be something of the same order:

julia> function iter(n,draw)
           for i in 1:1000
               draw(n)
           end
           nothing
       end
iter (generic function with 1 method)

julia> function draw2(n)
           a = rand(1:n^2 - n)
           b = mod(a,n) + div(n^3,n) # <--- some random stuff
           if b >= a
               b += 1
           end
           a, b
       end
draw2 (generic function with 1 method)

julia> @btime iter(3,$draw2)
  13.623 μs (0 allocations: 0 bytes)

julia> function draw3(n)
           a = rand(1:n)
           b = rand(1:n-1)
           if b >= a
               b += 1
           end
           a, b
       end
draw2 (generic function with 1 method)

julia> @btime iter(3,$draw3)
  14.473 μs (0 allocations: 0 bytes)

DNF · November 30, 2021, 1:03pm

If you provide the rng, it looks to me like divrem (which should be more or less the best case here) is definitely more expensive than rand:

function draw2_(n, rng=MersenneTwister())
    a = rand(rng, 1:n)
    b = rand(rng, 1:n-1)
    b += (b >= a)
    return a, b
end

function draw1(n, rng=Xoshiro())
    ab = rand(rng, 0:(n^2-n-1))
    (a, b) = divrem(ab, n)  # best case tranformation
    return a, b  # wrong answer!!
end

1.7.0-rc3> @btime draw2_(10_000, rng) setup=(rng=Xoshiro())
  8.000 ns (0 allocations: 0 bytes)
(7278, 3967)

1.7.0-rc3> @btime draw1(10_000, rng) setup=(rng=Xoshiro())
  12.412 ns (0 allocations: 0 bytes)
(2233, 4742)

Without the rng it is practically a ‘draw’:

function draw2b(n)
    a = rand(1:n)
    b = rand(1:n-1)
    b += (b >= a)
    return a, b
end

1.7.0-rc3> @btime draw2b(10_000)
  12.412 ns (0 allocations: 0 bytes)
(2873, 7609)

lmiq · November 30, 2021, 1:09pm

Here is a working one (it is more expensive - but I need to call rand(Bool) once, I don’t know if that is less expensive than a rand(Int), if not it does not make sense):

# get cartesian coordinates from linear index of upper triangular
julia> @inline function iuppert(k::Integer,n::Integer) 
         i = n - 1 - floor(Int,sqrt(-8*k + 4*n*(n-1) + 1)/2 - 0.5)
         j = k + i + ( (n-i+1)*(n-i) - n*(n-1) )÷2
         return i, j
       end
iuppert (generic function with 1 method)

julia> function draw4(n)
           ind = rand(1:(n^2-n)÷2)
           a, b = iuppert(ind,n)
           if rand(Bool) # need to allow lower triangular as well
               a, b = b, a
           end
           a, b
       end
draw4 (generic function with 1 method)

# seems correct:
julia> histogram(reinterpret(Int,[draw4(3)[2] for i = 1:100000]), nbins=3)
              ┌                                        ┐ 
   [1.0, 2.0) ┤████████████████████████████████▌ 33053   
   [2.0, 3.0) ┤█████████████████████████████████  33503  
   [3.0, 4.0) ┤████████████████████████████████▉ 33444   
  
# but slower 
julia> function iter(n,draw)
           for i in 1:1000
               draw(n)
           end
           nothing
       end
iter (generic function with 1 method)

julia> @btime iter(3,$draw4)
  19.474 μs (0 allocations: 0 bytes)

julia> function draw2(n)
           a = rand(1:n)
           b = rand(1:n-1)
           if b >= a
               b += 1
           end
           a, b
       end
draw2 (generic function with 1 method)


julia> @btime iter(3,$draw2)
  14.576 μs (0 allocations: 0 bytes)

GunnarFarneback · November 30, 2021, 2:36pm

lmiq:

# get cartesian coordinates from linear index of upper triangular
julia> @inline function iuppert(k::Integer,n::Integer) 
         i = n - 1 - floor(Int,sqrt(-8*k + 4*n*(n-1) + 1)/2 - 0.5)
         j = k + i + ( (n-i+1)*(n-i) - n*(n-1) )÷2
         return i, j
       end

Much as I love this relation after having derived it in a course ages ago, it is really just making a relatively simple problem way more complicated here. Try this implementation instead:

function draw5(n)
    ind = rand(0:(n * (n - 1) - 1))
    a, b = divrem(ind, n)
    a += (a >= b)
    return a + 1, b + 1
end

lmiq · November 30, 2021, 2:44pm

Great! And that is faster than calling rand() twice:

julia> function iter(n,draw)
           for i in 1:1000
               draw(n)
           end
           nothing
       end
iter (generic function with 1 method)

julia> function draw5(n)
           ind = rand(0:(n * (n - 1) - 1))
           a, b = divrem(ind, n)
           a += (a >= b)
           return a + 1, b + 1
       end
draw5 (generic function with 1 method)

julia> @btime iter(3,$draw5)
  11.083 μs (0 allocations: 0 bytes)

julia> using UnicodePlots

julia> histogram(reinterpret(Int,[draw5(3) for i = 1:100000]), nbins=3)
              ┌                                        ┐ 
   [1.0, 2.0) ┤████████████████████████████████▊ 66522   
   [2.0, 3.0) ┤████████████████████████████████▊ 66651   
   [3.0, 4.0) ┤█████████████████████████████████  66827  
              └                                        ┘ 
                               Frequency

Topic		Replies	Views
Random with no duplicate New to Julia	4	1735	December 21, 2021
Random specific in julia New to Julia	3	631	January 16, 2018
How do I create an array with random unique numbers in a specific range? New to Julia question , random	20	1274	June 24, 2024
Sampling from a list of integers without repetition New to Julia	3	6842	June 15, 2020
Customize a random function to sample 3 out of a list of 4097 real numbers Performance question	5	249	July 7, 2023

Draw two distinct random integers from `1:n`

Related topics