# Fastest way to simulate the birthday paradox

Trying to write a script for a video to explain some Julia programming concepts. Was trying to use the birthday “paradox” as example. It states that in a room of 23 ppl, there is a 50% chance that at least one pair of person will have the same birthday.

I am trying to write code to simulate this to confirm the results.

This is the fastest code I can come up with. It’s about 0.25s for one million simulations

I have used these tricks `@inbounds`, pre-allocation to avoid repeated allocation. Customised, one-by-one randomly bday generation, to avoid having to generate all `n` birthdays at once.

I wonder if there are other tricks? Interested in interesting tricks I have missed.

Interestingly I tried AutoPreAllocation.jl but it gave error, so that didn’t work.

``````# Birtyday paradox
function atleast2_v3!(d, npersons)
d .= false
@inbounds d[rand(1:365)] = true
@inbounds for _ in 2:npersons
bday = rand(1:365)
d[bday] && return true
d[bday] = true
end
false
end

function simulate_bday_v3(npersons, ntimes=1_000_000)
cnt = 0
d = falses(365)
for _ in 1:ntimes
cnt += atleast2_v3!(d, npersons)
end
cnt/ntimes
end

using BenchmarkTools
@time simulate_bday_v3(23, 1_000_000)
``````

Just came up with another way which is not to keep an array of birthday at all

``````# Birtyday paradox
@inline function atleast2_v6(npersons)
@inbounds for i in 1:npersons-1
rand() < i/365 && return true
end
false
end

function simulate_bday_v6(npersons, ntimes=1_000_000)
cnt = 0
for _ in 1:ntimes
cnt += atleast2_v6(npersons)
end
cnt/ntimes
end

using BenchmarkTools
@time simulate_bday_v6(23, 1_000_000)

@benchmark simulate_bday_v6(23, 1_000_000)
``````

It’s now down to 78ms per 1 million simulations.

Multithreading will help for sure, but I think optimising single-threaded-ness is a good start. I can cut it down to 32 using 6 threads on my i7

``````# Birtyday paradox
@inline function atleast2_v8(npersons)
@inbounds for i in 1:npersons-1
365rand() < i && return true
end
false
end

function _simulate_bday_v8(npersons, ntimes=1_000_000)
cnt = 0
for _ in 1:ntimes
cnt += atleast2_v8(npersons)
end
cnt
end

function simulate_bday_v8(npersons, ntimes=1_000_000)
end
end

using BenchmarkTools
@time simulate_bday_v8(23, 1_000_000)

@benchmark simulate_bday_v8(23, 1_000_000)

# BenchmarkTools.Trial:
#   memory estimate:  5.23 KiB
#   allocs estimate:  32
#   --------------
#   minimum time:     17.380 ms (0.00% GC)
#   median time:      18.441 ms (0.00% GC)
#   mean time:        18.734 ms (0.00% GC)
#   maximum time:     26.517 ms (0.00% GC)
#   --------------
#   samples:          267
#   evals/sample:     1
``````
1 Like

The GPU version can reach a scary 2ms on my RTX2080 (no Ti)

``````BenchmarkTools.Trial:
memory estimate:  1.28 KiB
allocs estimate:  42
--------------
minimum time:     1.813 ms (0.00% GC)
median time:      1.930 ms (0.00% GC)
mean time:        76.165 ms (0.04% GC)
maximum time:     304.536 ms (0.00% GC)
--------------
samples:          66
evals/sample:     1
``````

Code

``````# Birtyday paradox

using Pkg

Pkg.activate("c:/scratch/cuda-test")

using CUDA

CUDA.allowscalar(false)

N = 1_000_000

# The winner of bencharmks/bencharmks-countmap

function atleast2_gpu_v1!(b, e)

i = (blockIdx().x - 1) * blockDim().x + threadIdx().x

stride = gridDim().x * blockDim().x

j = Float32(0.0)

for idx in i:stride:size(e, 2)

local_j = Float32(0.0)

for k in 1:23-1

if e[k, idx] < k/365

local_j = max(local_j, Float32(1.0))

end

end

j += local_j

end

@atomic b[t] = b[t] + j

return

end

function simulate_bday_gpu_v1(; threads = 256, blocks = N ÷ 256)

b = CUDA.zeros(Float32, threads)

r = CUDA.rand(Float32, 23-1, N)

CUDA.@sync @cuda threads = threads blocks = blocks atleast2_gpu_v1!(b, r)

sum(b)/1_000_000

end

using BenchmarkTools

@benchmark CUDA.@sync blocking = false simulate_bday_gpu_v1()

``````
6 Likes