The birthday paradox: help to be faster than Fortran

Introduction

I am not able to beat Fortran using Julia. This is the time spent by the algorithm using these two languages:

Fortran 0m 42s
Julia   4m 33s

As you can see, Julia takes 6.5 times the time Fortran takes to solve the problem.

What is this algorithm about?

There is a planet whose years have n_days days. In this planet there are n_people
people, so, calculates in extent all possible cases. What is the probability that two people
were born on the same day?

As you may know, in probability theory, the birthday problem or birthday paradox
concerns the probability that, in a set of n randomly chosen people, some
pair of them will have the same birthday. Wikipedia

The probability problem has a well known solution but this algorithm gets all
possibilities where two or more people have been born the same day
.

Data structure

This is the vector v. Its size is equal to the number of people in the planet.
Each item in the vector represents the day each person was born. So an example:

 Person:          1     2     3     4     5    ...  n_people
               ---------------------------------------------
 Vector:: v    |  1  |  4  |  1  |  7  |  6  | ... |   5   |
               ---------------------------------------------
                  ^     ^     ^     ^     ^     ^     ^
                               Day of birthday

 size(v) == n_people
 v[1] == Day of the year in which person 1 was born
 v[2] == Day of the year in which person 2 was born
 ...
 v[n] == Day of the year in which person 'n' was born

 The numbers in the vector are in the range `1...n_days`

The process

It consists in testing all combinations in the vector v and test if there are two or more
people who were born the same day.

Help

  • Any idea what can do to increase dramatically the Julia speed?
  • I have been recommended to use static arrays, but I am not sure if this is a real solution, as I understand, Julia detects the size of the array passed as argument. I am not sure.

Source code

1 Like

Setup variables can be written like this, to avoid converting the variables after initialization:

	# Setup variables
	v = ones(Int, n_people)
	n_coincidences = zero(Int)
	percent = 0

But this part is very puzzling to me:

# Process: Test all combination in the vector v
    for j::Int64 = 1:div(total,100):total
        for i::Int64 = j:j+div(total,100)-1
            if same_day(v, n_people)
                n_coincidences += 1
            end

            add1_recursive(v, n_people, n_people, n_days)
            #add1_iterative(v, n_people, n_days)
        end

        percent += 1
        println(percent, "%")
    end

Why do you run same_day(v, n_people) and add1_recursive(v, n_people, n_people, n_days), which are unrelated to the loop, inside the loop?

add1_recursive increments v, treating it as an n_people-digit integer in base n_days. The two loops are just iterating over all possible values of that n_people-digit integer. It looks like the division into an inner and outer loop is just to allow the code to print a progress indicator every time it completes 1% of the total iteration. By the way, @profesor_s the GitHub - timholy/ProgressMeter.jl: Progress meter for long-running computations package provides a really nice way to do that sort of thing in Julia.

3 Likes

In addition to checking the algorithm, did you read Performance Tips · The Julia Language? If you want to write high-performance code in Julia you must know this page by heart. Your add1_recursive is type-unstable:

julia> @code_warntype add1_recursive(v, n_people, n_people, n_days)
Variables
  #self#::Core.Compiler.Const(add1_recursive, false)
  v::Array{Int64,1}
  p::Int64
  n_people::Int64
  n_days::Int64

Body::Union{Nothing, Int64}
1 ─ %1  = (p < 1)::Bool
└──       goto #3 if not %1
2 ─       return Main.nothing
3 ─ %4  = Base.getindex(v, p)::Int64
│   %5  = (%4 != n_days)::Bool
└──       goto #5 if not %5
4 ─ %7  = Base.getindex(v, p)::Int64
│   %8  = (%7 + 1)::Int64
│         Base.setindex!(v, %8, p)
└──       return %8
5 ─       Base.setindex!(v, 1, p)
│   %12 = (p - 1)::Int64
│   %13 = Main.add1_recursive(v, %12, n_people, n_days)::Union{Nothing, Int64}
└──       return %13

Just by adding a return nothing at the end of the function to make the function always return nothing my runtime goes from 140 to 110 seconds.

8 Likes

I tried a couple of basic tricks to see what kind of improvement I could get. For testing, I set n_days=20 and removed all the printouts so that we can run benchmarks quickly. Here’s a baseline:

julia> using BenchmarkTools

julia> @btime main()
  39.061 ms (1 allocation: 128 bytes)

Moving the bounds checks in same_day out of the loop helps a little bit:

function same_day(v, n_people)
	@boundscheck(n_people >= firstindex(v) && n_people <= lastindex(v) || throw(BoundsError()))
	@inbounds for i = 1:n_people-1
		for j = i+1:n_people
			if v[i] == v[j]		# If there are not more combinations
				return true
			end
		end
	end
	return false
end
julia> @btime main()
  37.543 ms (1 allocation: 128 bytes)

Also, from a stylistic perspective, you are passing v and n_people into all of your functions, even though we know that n_people is defined to be the length of v. There’s no reason to do this in Julia, and it doesn’t help performance at all. The only thing it does is create an opportunity for mistakes if you accidentally pass in a n_people which is not the length of v.

Instead, just pass v and then use length(v) inside the function as necessary. For example:

# Are there two people who where born the same day?
function same_day(v)
	n_people = length(v)
	@boundscheck(n_people >= firstindex(v) && n_people <= lastindex(v) || throw(BoundsError()))
	@inbounds for i = 1:n_people-1
		for j = i+1:n_people
			if v[i] == v[j]		# If there are not more combinations
				return true
			end
		end
	end
	return false
end
2 Likes

Getting the inner loop to unroll seemed to help a bunch; with v as a MVector, using this definition

using Unrolled
@unroll function same_day(v)
	for i=1:length(v)-1
		vi = v[i]
		@unroll for j = 1:length(v)
			if (i < j) && vi == v[j]		# If there are not more combinations
				return true
			end
		end
	end
	return false
end

got it down to 35 seconds for me.

Full code
const n_days = 100
const n_people = 5
const total = n_days^n_people
using ProgressMeter, StaticArrays

function main()
	# Setup variables
	v = @MArray ones(Int, n_people)
	n_coincidences::Int64 = 0

    # Process: Test all combination in the vector v
	@showprogress for j::Int64 = 1:div(total,100):total
		for i::Int64 = j:j+div(total,100)-1
			if same_day(v)
				n_coincidences += 1
			end
            # add1_recursive(v, n_people, n_days)
			add1_iterative(v, n_days)
		end
	end

    # Show results 
	println("Coincidences ", n_coincidences, "/", total, "=", n_coincidences/total*100,"%")
end

# 
function add1_iterative(v, n_days)
	n_people = length(v)
	# v(p). Increment position p
	# v is the vector with birthdays
	p = n_people
	for i = n_people:-1:1
		if v[i] < n_days
            p = i
			break
        else
			v[i] = 1
        end
    end
    
    # If there are not more combinations
	if p == 1 && v[1] == n_days
		return
	end
    
    # Next combination in the vector
	v[p] += 1

	return nothing
end

function add1_recursive(v, p, n_days)
	n_people = length(v)

	# v(p). Increment position p
	# v is the vector with birthdays
    # Look for the first figure that can be incremented
    # `p` is the index position that must be increased v(p)
    if p < 1
        return
	elseif v[p] != n_days
		v[p] += 1
    else
		v[p] = 1
        add1_recursive(v, p-1, n_days)
	end
	return nothing
end

using Unrolled
@unroll function same_day(v)
	for i=1:length(v)-1
		vi = v[i]
		@unroll for j = 1:length(v)
			if (i < j) && vi == v[j]		# If there are not more combinations
				return true
			end
		end
	end
	return false
end

main()
2 Likes

The main area where we don’t optimize as well as gcc is recursive inlining. So you have to do that yourself.

const n_days = 100
const n_people = 5
const total = n_days^n_people

function main()
    # Setup variables
    v = ones(Int, n_people)
    n_coincidences = 0
    percent = 0

    # Process: Test all combination in the vector v
    for j = 1:div(total,100):total
        for i = j:j + div(total,100)-1
            if same_day(v, n_people)
                n_coincidences += 1
            end
            add1_recursive(v, n_people, n_people, n_days)
        end
        percent += 1
        println(percent, "%")
    end

    # Show results
    println("Coincidences ", n_coincidences, "/", total, "=", n_coincidences/total*100,"%")
end


# Compute next iteration in the vector `v` of birthdays
@inline function add1_recursive(v, p, n_people, n_days)
    # `v` is the vector with birthdays
    # `p` is the index position that must be increased v[p]+=1
    @label start
    @inbounds if p < 1
        return
    elseif v[p] != n_days
        v[p] += 1
    else
        v[p] = 1
        p = p - 1
        @goto start
    end
    return
end

# Are there two people who where born the same day?
@inline function same_day(v, n_people)
    @inbounds for i = 1:n_people-1
        for j = i+1:n_people
            if v[i] == v[j]                # There are two people who were born same day
                return true
            end
        end
    end
    return false
end

This get me from 120s to 37s.

The inline is also something a C/fortran compiler can easily do since it sees all the caller whereas in julia it’s much less clear if it’ll be benefical, hence the @inline.

19 Likes

Is it already faster after these changes?

First of all I want to thank you for your help and recommendations. The Julia code is much faster than before. Now these are the results:

Next iteration

New results

- Fortran 0m 42s
- Julia  1m 32s !! Much better

Changes in code

  • Renamed function add1_recursive() to next_iteration()
  • Vector v now is a static array
  • No bounds checked for loops of same_day() function
  • No bounds checked for loops of next_iteration() function
  • Added a goto statement to take a shortcut
  • Added return nothing at the end of next_iteration()
  • Removed old progress system and added new @showprogress
  • Removed parameter n_people in function same_day() to avoid mistakes

The new source code

Here you have the new Julia code.

Information about my system

  • Julia 1.4.1
  • CPU Model: 6.78.3 “Intel(R) Core™ i3-6006U CPU @ 2.00GHz”
  • Memory Size: 3 GB + 768 MB
  • Ubuntu 20.04.1 LTS (Focal Fossa)
  • Linux kernel 5.4.0-42-generic
3 Likes

Awkward, but I do get better performance out of this:

julia> @generated function same_day(v, ::Val{n_people}) where {n_people}
           q = Expr(:block); for i = 1:n_people-1
               for j = i+1:n_people
                   push!(q.args, :(v[$i] == v[$j] && return true))
               end
           end
           quote
               $(Expr(:meta,:inline))
               @inbounds begin
                   $q
               end
               false
           end
       end
same_day (generic function with 3 methods)

julia> @inline function add1_recursive(v, p, n_people, n_days)
           # `v` is the vector with birthdays
           # `p` is the index position that must be increased v[p]+=1
           @inbounds while true;
           if p < 1
             return
           elseif v[p] != n_days
               v[p] += 1; return
           else
               v[p] = 1
               p -= 1
           end
           end
       end
add1_recursive (generic function with 1 method)

You then need to call same_day(v, Val(n_people)) inside main.

With this, I got:

Coincidences 965497600/10000000000=9.654976%
 20.213654 seconds (1.15 k allocations: 35.359 KiB)

Versus gfortran:

Coincidences  965497600 / 10000000000 =  9.65%

________________________________________________________
Executed in   16.86 secs   fish           external
   usr time   16.85 secs  229.00 micros   16.85 secs
   sys time    0.00 secs   76.00 micros    0.00 secs
1 Like

It looks like this is measuring compile time in Julia but not in Fortran. You might want to measure it in both or measure it in neither, but measuring it in one but not the other is a bit odd.

https://github.com/profesors/Fortran/blob/master/01_benchmarks/02_birthday.md#results

2 Likes

There is generally no need for these if it isn’t needed in fortran.

For a ~1min runtime, the compile time here is fairly neglegible.

2 Likes

Is that because total is wrong?

julia> Int(n_days)^n_people
10000000000

julia> n_days^n_people
1410065408

Note that the Fortran program correctly converts to an 8-byte integer before exponentiation:

integer(kind=8), parameter:: total = int(n_days,8) ** n_people

so this does not explain the performance difference.

1 Like

Yes you are right.

I may have missed an explanation but is there a reason that you are trying to calculate the probability of a repeated birthday by enumeration? This is the classic example in an elementary probability course of showing that sometimes it is easier to evaluate the probability of the complement (i.e. no repeated birthdays) than trying to evaluate the probability of one or more repetitions.

In R there are functions pbirthday and qbirthday for evaluating the cumulative probability and its inverse, the quantile function. In Julia the probability of a repeated birthday in n_people in a year of n_days can be evaluated as

julia> pbirthday(n_people, n_days) = 1 - prod((n_days:-1:(n_days - n_people + 1)) ./ n_days)
pbirthday (generic function with 1 method)

julia> pbirthday(20, 365)  # prob of rep. birthday in random group of 20 people
0.41143838358058027

which is a transcription of the R function,

3 Likes

I think the idea is to benchmark the same code (i.e. regardless if the algorithm is optimal or not)

4 Likes

Hi again. When I began with it I was conscious it is a benchmark not a utility app.

The formula to obtain this probability is well known but it is not the goal of this code to get the probability (the number) but all the sequences that fulfill the goal.

1 Like