Benchmarking Julia vs. Python vs. R with PyCall and RCall

I’m putting together an Intro to Julia Jupyter notebook and in my section on Why you should learn Julia I’m emphasizing Julia’s performance. This notebook is for colleagues of mine who primarily use R and a few that use Python. I’m including a few lines of simple benchmarks that look like this:

using BenchmarkTools
using PyCall
using RCall

a = rand(10^7)

@btime pybuiltin("sum")(a)
@btime R"sum($a)"
@btime sum(a)

When you execute this code, Julia absolutely obliterates Python and dramatically outperforms R as well. However, the question that I will undoubtedly get is, “How can I be sure that it doesn’t take longer to execute Python/R code via the PyCall/RCall packages?” or something along those lines. People will obviously want to know if this is a fair way to compare speeds and I simply don’t know enough about the way these packages work to answer those questions.

Does anyone here know if this is a fair comparison or if there are indeed additional processes taking place given that a was instantiated in Julia but is being operated on in the other language (or for some other reason)? Aside from telling them to measure the speeds themselves in their normal working environments, is there a good way to convince a skeptical crowd that these are legitimate comparisons?

2 Likes

U r passing data back and forth woth python and R

For fair comparison. Do the sum in R not via julia

On my machine:

Julia

julia> using BenchmarkTools

julia> a = rand(10^7);

julia> @benchmark sum($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.706 ms (0.00% GC)
  median time:      4.215 ms (0.00% GC)
  mean time:        4.229 ms (0.00% GC)
  maximum time:     5.408 ms (0.00% GC)
  --------------
  samples:          1180
  evals/sample:     1

R

> library(microbenchmark)
> a <- runif(1e7)
> microbenchmark(sum(a))
Unit: milliseconds
   expr      min      lq     mean   median       uq      max neval
 sum(a) 8.633446 8.64609 8.781826 8.700741 8.792563 10.75872   100

Python

In [10]: import numpy as np

In [11]: np_a = np.random.rand(10**7)

In [12]: a = np_a.tolist()

In [13]: %timeit np.sum(np_a)
4.08 ms ± 23.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [14]: %timeit sum(a)
35.8 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Performance is nice, but it’s not the primary reason I use Julia. Multiple dispatch, the design of the type system, the support for functional programming, and the ecosystem of numerical and scientific computing packages are what draw me to the language.

3 Likes

Converting Julia Arrays to Python Lists takes some time because they have a completely different memory structure. However, passing Julia Arrays as Numpy Arrays is usually very fast.
The PyCall overhead is in my experience <<1ms if no significant amount of data is transferred. To be on the safe side, I suggest to cross-check the Python and R benchmarks using native Python/ R notebooks.
I did a comparison of Julia to Python for DataFrames, maybe this is useful for you:

2 Likes