Benchmarking Julia vs. Python vs. R with PyCall and RCall

mthelm85 · April 10, 2020, 1:08am

I’m putting together an Intro to Julia Jupyter notebook and in my section on Why you should learn Julia I’m emphasizing Julia’s performance. This notebook is for colleagues of mine who primarily use R and a few that use Python. I’m including a few lines of simple benchmarks that look like this:

using BenchmarkTools
using PyCall
using RCall

a = rand(10^7)

@btime pybuiltin("sum")(a)
@btime R"sum($a)"
@btime sum(a)

When you execute this code, Julia absolutely obliterates Python and dramatically outperforms R as well. However, the question that I will undoubtedly get is, “How can I be sure that it doesn’t take longer to execute Python/R code via the PyCall/RCall packages?” or something along those lines. People will obviously want to know if this is a fair way to compare speeds and I simply don’t know enough about the way these packages work to answer those questions.

Does anyone here know if this is a fair comparison or if there are indeed additional processes taking place given that a was instantiated in Julia but is being operated on in the other language (or for some other reason)? Aside from telling them to measure the speeds themselves in their normal working environments, is there a good way to convince a skeptical crowd that these are legitimate comparisons?

xiaodai · April 10, 2020, 2:17am

U r passing data back and forth woth python and R

For fair comparison. Do the sum in R not via julia

CameronBieganek · April 10, 2020, 3:09am

On my machine:

Julia

julia> using BenchmarkTools

julia> a = rand(10^7);

julia> @benchmark sum($a)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.706 ms (0.00% GC)
  median time:      4.215 ms (0.00% GC)
  mean time:        4.229 ms (0.00% GC)
  maximum time:     5.408 ms (0.00% GC)
  --------------
  samples:          1180
  evals/sample:     1

R

> library(microbenchmark)
> a <- runif(1e7)
> microbenchmark(sum(a))
Unit: milliseconds
   expr      min      lq     mean   median       uq      max neval
 sum(a) 8.633446 8.64609 8.781826 8.700741 8.792563 10.75872   100

Python

In [10]: import numpy as np

In [11]: np_a = np.random.rand(10**7)

In [12]: a = np_a.tolist()

In [13]: %timeit np.sum(np_a)
4.08 ms ± 23.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [14]: %timeit sum(a)
35.8 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Performance is nice, but it’s not the primary reason I use Julia. Multiple dispatch, the design of the type system, the support for functional programming, and the ecosystem of numerical and scientific computing packages are what draw me to the language.

lungben · April 10, 2020, 7:09am

Converting Julia Arrays to Python Lists takes some time because they have a completely different memory structure. However, passing Julia Arrays as Numpy Arrays is usually very fast.
The PyCall overhead is in my experience <<1ms if no significant amount of data is transferred. To be on the safe side, I suggest to cross-check the Python and R benchmarks using native Python/ R notebooks.
I did a comparison of Julia to Python for DataFrames, maybe this is useful for you:

Topic		Replies	Views
How to call my R package by RCall package inside julia? General Usage	8	1705	December 17, 2020
Code written in Julia vs code written using PyCall New to Julia	5	2202	October 12, 2018
Material for discussing differences/advantages between Julia, R, Python, Matlab, and C Community question	4	2036	April 18, 2017
Is PyCall slower than directly using Python? Performance pycall , benchmark , python	10	1852	November 14, 2021
Is it time consuming to call python functions from Julia compared with coding with julia directly? General Usage	1	529	February 26, 2019

Benchmarking Julia vs. Python vs. R with PyCall and RCall

Julia

R

Python

Related topics