I was trying to benchmark some operations in python and since I wanted to do the metaanalysis in a Julia notebook, I found it convenient to use PyCall.
I know that PyCall puts some extra overhead on the object conversion between Julia & Python, and that’s why I decided to put the benchmarking inside the python call itself using timeit
.
This way the data exchange wouldn’t be part of the benchmarking.
And then I would just trigger the benchmarking with PyCall and get the benchmarking data.
In my surprise I found that directly using Python was faster.
Examples
Example 1
So let’s say I have defined this function in a python file “benches.py”:
# append to an list
def bench(times):
a = []
for i in range(1,times+1):
a.append(i)
return a
I also define a function to benchmark my code of interest:
def get_bench_results(n):
bs = []
random.seed(1)
#get time in nanosecs
bs = timeit.timeit(f"bench({n})", globals=globals(), number=10_000)/10_000*1_000_000_000
return bs
when I call it with python get_bench_results(100_000)
I get 5.686130 msec.
Doing the same with PyCall:
using PyCall
scriptdir = @__DIR__
pushfirst!(PyVector(pyimport("sys")."path"), scriptdir)
mymodule= pyimport("benches")
mymodule.get_bench_results(100_000)
I get 7.318859 msec, so around 1.3x times slower.
Example 2
benchmarking the sum function still yields different results for python and Pycall but way smaller :
def bench2(ar):
return sum(ar)
def get_bench2_results(n):
bs = []
random.seed(1)
ar = [random.random() for _ in range(1,n)]
#get time in ns
bs = timeit.timeit(f"bench2({ar})", globals=globals(), number=10_000)/10_000*1_000_000_000
return bs
for python I get get_bench2_results(100_000)
= 1.714234 msec
and for PyCall mymodule.get_bench2_results(100_000)
= 1.811019 msec
so around 1.1x slower for PyCall.
It looks like the first one had serious dynamic memory allocation and maybe this creates some overhead when the python process is running inside from Julia ?
Ofc I don’t think it’s even close to a serious problem but it kind of made me curious to know if there is inherently some performance bottleneck by using PyCall outside the conversion offset.