# Introduction

I am not able to beat Fortran using Julia. This is the time spent by the algorithm using these two languages:

``````Fortran 0m 42s
Julia   4m 33s
``````

As you can see, Julia takes 6.5 times the time Fortran takes to solve the problem.

## What is this algorithm about?

There is a planet whose years have `n_days` days. In this planet there are `n_people`
people, so, calculates in extent all possible cases. What is the probability that two people
were born on the same day?

As you may know, in probability theory, the birthday problem or birthday paradox
concerns the probability that, in a set of n randomly chosen people, some
pair of them will have the same birthday. Wikipedia

The probability problem has a well known solution but this algorithm gets all
possibilities where two or more people have been born the same day
.

## Data structure

This is the vector `v`. Its size is equal to the number of people in the planet.
Each item in the vector represents the day each person was born. So an example:

`````` Person:          1     2     3     4     5    ...  n_people
---------------------------------------------
Vector:: v    |  1  |  4  |  1  |  7  |  6  | ... |   5   |
---------------------------------------------
^     ^     ^     ^     ^     ^     ^
Day of birthday

size(v) == n_people
v == Day of the year in which person 1 was born
v == Day of the year in which person 2 was born
...
v[n] == Day of the year in which person 'n' was born

The numbers in the vector are in the range `1...n_days`

``````

## The process

It consists in testing all combinations in the vector `v` and test if there are two or more
people who were born the same day.

# Help

• Any idea what can do to increase dramatically the Julia speed?
• I have been recommended to use static arrays, but I am not sure if this is a real solution, as I understand, Julia detects the size of the array passed as argument. I am not sure.

# Source code

1 Like

Setup variables can be written like this, to avoid converting the variables after initialization:

``````	# Setup variables
v = ones(Int, n_people)
n_coincidences = zero(Int)
percent = 0
``````

But this part is very puzzling to me:

``````# Process: Test all combination in the vector v
for j::Int64 = 1:div(total,100):total
for i::Int64 = j:j+div(total,100)-1
if same_day(v, n_people)
n_coincidences += 1
end

end

percent += 1
println(percent, "%")
end
``````

Why do you run `same_day(v, n_people)` and `add1_recursive(v, n_people, n_people, n_days)`, which are unrelated to the loop, inside the loop?

`add1_recursive` increments `v`, treating it as an `n_people`-digit integer in base `n_days`. The two loops are just iterating over all possible values of that `n_people`-digit integer. It looks like the division into an inner and outer loop is just to allow the code to print a progress indicator every time it completes 1% of the total iteration. By the way, @profesor_s the https://github.com/timholy/ProgressMeter.jl package provides a really nice way to do that sort of thing in Julia.

3 Likes

In addition to checking the algorithm, did you read https://docs.julialang.org/en/v1/manual/performance-tips/? If you want to write high-performance code in Julia you must know this page by heart. Your `add1_recursive` is type-unstable:

``````julia> @code_warntype add1_recursive(v, n_people, n_people, n_days)
Variables
v::Array{Int64,1}
p::Int64
n_people::Int64
n_days::Int64

Body::Union{Nothing, Int64}
1 ─ %1  = (p < 1)::Bool
└──       goto #3 if not %1
2 ─       return Main.nothing
3 ─ %4  = Base.getindex(v, p)::Int64
│   %5  = (%4 != n_days)::Bool
└──       goto #5 if not %5
4 ─ %7  = Base.getindex(v, p)::Int64
│   %8  = (%7 + 1)::Int64
│         Base.setindex!(v, %8, p)
└──       return %8
5 ─       Base.setindex!(v, 1, p)
│   %12 = (p - 1)::Int64
│   %13 = Main.add1_recursive(v, %12, n_people, n_days)::Union{Nothing, Int64}
└──       return %13
``````

Just by adding a `return nothing` at the end of the function to make the function always return `nothing` my runtime goes from 140 to 110 seconds.

8 Likes

I tried a couple of basic tricks to see what kind of improvement I could get. For testing, I set `n_days=20` and removed all the printouts so that we can run benchmarks quickly. Here’s a baseline:

``````julia> using BenchmarkTools

julia> @btime main()
39.061 ms (1 allocation: 128 bytes)
``````

Moving the bounds checks in `same_day` out of the loop helps a little bit:

``````function same_day(v, n_people)
@boundscheck(n_people >= firstindex(v) && n_people <= lastindex(v) || throw(BoundsError()))
@inbounds for i = 1:n_people-1
for j = i+1:n_people
if v[i] == v[j]		# If there are not more combinations
return true
end
end
end
return false
end
``````
``````julia> @btime main()
37.543 ms (1 allocation: 128 bytes)
``````

Also, from a stylistic perspective, you are passing `v` and `n_people` into all of your functions, even though we know that `n_people` is defined to be the length of `v`. There’s no reason to do this in Julia, and it doesn’t help performance at all. The only thing it does is create an opportunity for mistakes if you accidentally pass in a `n_people` which is not the length of `v`.

Instead, just pass `v` and then use `length(v)` inside the function as necessary. For example:

``````# Are there two people who where born the same day?
function same_day(v)
n_people = length(v)
@boundscheck(n_people >= firstindex(v) && n_people <= lastindex(v) || throw(BoundsError()))
@inbounds for i = 1:n_people-1
for j = i+1:n_people
if v[i] == v[j]		# If there are not more combinations
return true
end
end
end
return false
end
``````
2 Likes

Getting the inner loop to unroll seemed to help a bunch; with `v` as a `MVector`, using this definition

``````using Unrolled
@unroll function same_day(v)
for i=1:length(v)-1
vi = v[i]
@unroll for j = 1:length(v)
if (i < j) && vi == v[j]		# If there are not more combinations
return true
end
end
end
return false
end
``````

got it down to 35 seconds for me.

Full code
``````const n_days = 100
const n_people = 5
const total = n_days^n_people
using ProgressMeter, StaticArrays

function main()
# Setup variables
v = @MArray ones(Int, n_people)
n_coincidences::Int64 = 0

# Process: Test all combination in the vector v
@showprogress for j::Int64 = 1:div(total,100):total
for i::Int64 = j:j+div(total,100)-1
if same_day(v)
n_coincidences += 1
end
end
end

# Show results
println("Coincidences ", n_coincidences, "/", total, "=", n_coincidences/total*100,"%")
end

#
n_people = length(v)
# v(p). Increment position p
# v is the vector with birthdays
p = n_people
for i = n_people:-1:1
if v[i] < n_days
p = i
break
else
v[i] = 1
end
end

# If there are not more combinations
if p == 1 && v == n_days
return
end

# Next combination in the vector
v[p] += 1

return nothing
end

n_people = length(v)

# v(p). Increment position p
# v is the vector with birthdays
# Look for the first figure that can be incremented
# `p` is the index position that must be increased v(p)
if p < 1
return
elseif v[p] != n_days
v[p] += 1
else
v[p] = 1
end
return nothing
end

using Unrolled
@unroll function same_day(v)
for i=1:length(v)-1
vi = v[i]
@unroll for j = 1:length(v)
if (i < j) && vi == v[j]		# If there are not more combinations
return true
end
end
end
return false
end

main()
``````
2 Likes

The main area where we don’t optimize as well as gcc is recursive inlining. So you have to do that yourself.

``````const n_days = 100
const n_people = 5
const total = n_days^n_people

function main()
# Setup variables
v = ones(Int, n_people)
n_coincidences = 0
percent = 0

# Process: Test all combination in the vector v
for j = 1:div(total,100):total
for i = j:j + div(total,100)-1
if same_day(v, n_people)
n_coincidences += 1
end
end
percent += 1
println(percent, "%")
end

# Show results
println("Coincidences ", n_coincidences, "/", total, "=", n_coincidences/total*100,"%")
end

# Compute next iteration in the vector `v` of birthdays
@inline function add1_recursive(v, p, n_people, n_days)
# `v` is the vector with birthdays
# `p` is the index position that must be increased v[p]+=1
@label start
@inbounds if p < 1
return
elseif v[p] != n_days
v[p] += 1
else
v[p] = 1
p = p - 1
@goto start
end
return
end

# Are there two people who where born the same day?
@inline function same_day(v, n_people)
@inbounds for i = 1:n_people-1
for j = i+1:n_people
if v[i] == v[j]                # There are two people who were born same day
return true
end
end
end
return false
end
``````

This get me from 120s to 37s.

The inline is also something a C/fortran compiler can easily do since it sees all the caller whereas in julia it’s much less clear if it’ll be benefical, hence the `@inline`.

19 Likes

Is it already faster after these changes?

First of all I want to thank you for your help and recommendations. The Julia code is much faster than before. Now these are the results:

# Next iteration

## New results

``````- Fortran 0m 42s
- Julia  1m 32s !! Much better
``````

## Changes in code

• Renamed function `add1_recursive()` to `next_iteration()`
• Vector `v` now is a static array
• No bounds checked for loops of `same_day()` function
• No bounds checked for loops of `next_iteration()` function
• Added a `goto` statement to take a shortcut
• Added `return nothing` at the end of `next_iteration()`
• Removed old progress system and added new @showprogress
• Removed parameter `n_people` in function `same_day()` to avoid mistakes

## The new source code

Here you have the new Julia code.

• Julia 1.4.1
• CPU Model: 6.78.3 “Intel® Core™ i3-6006U CPU @ 2.00GHz”
• Memory Size: 3 GB + 768 MB
• Ubuntu 20.04.1 LTS (Focal Fossa)
• Linux kernel 5.4.0-42-generic
3 Likes

Awkward, but I do get better performance out of this:

``````julia> @generated function same_day(v, ::Val{n_people}) where {n_people}
q = Expr(:block); for i = 1:n_people-1
for j = i+1:n_people
push!(q.args, :(v[\$i] == v[\$j] && return true))
end
end
quote
\$(Expr(:meta,:inline))
@inbounds begin
\$q
end
false
end
end
same_day (generic function with 3 methods)

julia> @inline function add1_recursive(v, p, n_people, n_days)
# `v` is the vector with birthdays
# `p` is the index position that must be increased v[p]+=1
@inbounds while true;
if p < 1
return
elseif v[p] != n_days
v[p] += 1; return
else
v[p] = 1
p -= 1
end
end
end
add1_recursive (generic function with 1 method)
``````

You then need to call `same_day(v, Val(n_people))` inside `main`.

With this, I got:

``````Coincidences 965497600/10000000000=9.654976%
20.213654 seconds (1.15 k allocations: 35.359 KiB)
``````

Versus `gfortran`:

``````Coincidences  965497600 / 10000000000 =  9.65%

________________________________________________________
Executed in   16.86 secs   fish           external
usr time   16.85 secs  229.00 micros   16.85 secs
sys time    0.00 secs   76.00 micros    0.00 secs
``````
1 Like

It looks like this is measuring compile time in Julia but not in Fortran. You might want to measure it in both or measure it in neither, but measuring it in one but not the other is a bit odd.

2 Likes

There is generally no need for these if it isn’t needed in fortran.

For a ~1min runtime, the compile time here is fairly neglegible.

2 Likes

Is that because `total` is wrong?

``````julia> Int(n_days)^n_people
10000000000

julia> n_days^n_people
1410065408
``````

Note that the Fortran program correctly converts to an 8-byte integer before exponentiation:

``````integer(kind=8), parameter:: total = int(n_days,8) ** n_people
``````

so this does not explain the performance difference.

1 Like

Yes you are right.

I may have missed an explanation but is there a reason that you are trying to calculate the probability of a repeated birthday by enumeration? This is the classic example in an elementary probability course of showing that sometimes it is easier to evaluate the probability of the complement (i.e. no repeated birthdays) than trying to evaluate the probability of one or more repetitions.

In R there are functions `pbirthday` and `qbirthday` for evaluating the cumulative probability and its inverse, the quantile function. In Julia the probability of a repeated birthday in `n_people` in a year of `n_days` can be evaluated as

``````julia> pbirthday(n_people, n_days) = 1 - prod((n_days:-1:(n_days - n_people + 1)) ./ n_days)
pbirthday (generic function with 1 method)

julia> pbirthday(20, 365)  # prob of rep. birthday in random group of 20 people
0.41143838358058027
``````

which is a transcription of the R function,

3 Likes

I think the idea is to benchmark the same code (i.e. regardless if the algorithm is optimal or not)

4 Likes

Hi again. When I began with it I was conscious it is a benchmark not a utility app.

The formula to obtain this probability is well known but it is not the goal of this code to get the probability (the number) but all the sequences that fulfill the goal.

1 Like