Hello, I’m benchmarking Julia against some other langauges, mainly against Cython, Numba etc (in this case mainly Numba). I’m running a simple Monte Carlo estimate of Pi. Trying to recreate Python+Numba vs. Julia, and expand it to parallel execution.
For non-parallel Julia is slightly faster, not as much as the post, but reasonably quicker. However the threading for Julia seems to slow it down (thus Numba can be faster when in parallel). Any ideas why?
I’m still trying to wrap my head around all the different optimisations, @simd, @avx, @distributed, Threads.@threads, etc
I get the following running Julia (with export JULIA_NUM_THREADS = 4
):
3.092
115.656 ms (1 allocation: 16 bytes)
3.142268
3.1
398.524 ms (7851017 allocations: 119.80 MiB)
3.1422492
and running Numba (parallel is speeding it up):
3.196
147 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
3.16
71.8 ms ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The Julia code
using BenchmarkTools
function estimate_pi(nMC)
radius = 1.
diameter = 2. * radius
n_circle = 0
for i in 1:nMC
x = (rand() - 0.5) * diameter
y = (rand() - 0.5) * diameter
r = sqrt(x^2 + y^2)
if r <= radius
n_circle += 1
end
end
return (n_circle / nMC) * 4.
end
# threaded version (think this is the right approach)
function estimate_pi_thread(nMC)
radius = 1.
diameter = 2. * radius
n_circle = 0
Threads.@threads for i in 1:nMC
x = (rand() - 0.5) * diameter
y = (rand() - 0.5) * diameter
r = sqrt(x^2 + y^2)
if r <= radius
n_circle += 1
end
end
return (n_circle / nMC) * 4.
end
nMC2 = 10000000
println(estimate_pi(1000)) # to warm up fn
@btime pi_est =estimate_pi(nMC2)
println(estimate_pi_thread(1000)) # to warm up fn
@btime pi_est = estimate_pi_thread(nMC2)
Python/Numba code
from time import time
import numba
import numpy as np
@numba.njit
def estimate_pi_numba(nMC):
radius = 1.
diameter = 2. * radius
n_circle = 0
for i in range(nMC):
x = (np.random.random() - 0.5) * diameter
y = (np.random.random() - 0.5) * diameter
r = np.sqrt(x ** 2 + y ** 2)
if r <= radius:
n_circle += 1
return 4. * n_circle / nMC
# parallel version
@numba.njit(parallel=True)
def estimate_pi_numba_prange(nMC):
radius = 1.
diameter = 2. * radius
n_circle = 0
for i in numba.prange(nMC):
x = (np.random.random() - 0.5) * diameter
y = (np.random.random() - 0.5) * diameter
r = np.sqrt(x ** 2 + y ** 2)
if r <= radius:
n_circle += 1
return 4. * n_circle / nMC
nMC = 10000000
print(estimate_pi_numba(1000)) # to warm up fn
%timeit -n 10 pi_est = estimate_pi_numba(nMC)
print(estimate_pi_numba_prange(1000)) # to warm up fn
%timeit -n 10 pi_est = estimate_pi_numba_prange(nMC)