Is PyCall slower than directly using Python?

I was trying to benchmark some operations in python and since I wanted to do the metaanalysis in a Julia notebook, I found it convenient to use PyCall.

I know that PyCall puts some extra overhead on the object conversion between Julia & Python, and that’s why I decided to put the benchmarking inside the python call itself using timeit.
This way the data exchange wouldn’t be part of the benchmarking.
And then I would just trigger the benchmarking with PyCall and get the benchmarking data.

In my surprise I found that directly using Python was faster.

Examples
Example 1
So let’s say I have defined this function in a python file “benches.py”:

# append to an list
def bench(times):                                                                                               
    a = []                                                                                                       
    for i in range(1,times+1):                                                                                   
        a.append(i)                                                                                              
    return a  

I also define a function to benchmark my code of interest:

def get_bench_results(n):     
    bs = []               
    random.seed(1)
    #get time in nanosecs
    bs = timeit.timeit(f"bench({n})", globals=globals(), number=10_000)/10_000*1_000_000_000          
    return bs  

when I call it with python get_bench_results(100_000) I get 5.686130 msec.
Doing the same with PyCall:

using PyCall
scriptdir = @__DIR__
pushfirst!(PyVector(pyimport("sys")."path"), scriptdir)
mymodule= pyimport("benches")
mymodule.get_bench_results(100_000)

I get 7.318859 msec, so around 1.3x times slower.

Example 2
benchmarking the sum function still yields different results for python and Pycall but way smaller :

def bench2(ar):                                                                                                   
    return sum(ar)                                                                                                                                                                                          
                                                                                                                 
def get_bench2_results(n):                                                                                        
    bs = []                                                                                                      
    random.seed(1)                                                                                               
    ar = [random.random() for _ in range(1,n)]                                                                   
    #get time in ns                                                                                              
    bs = timeit.timeit(f"bench2({ar})", globals=globals(), number=10_000)/10_000*1_000_000_000                    
    return bs  

for python I get get_bench2_results(100_000) = 1.714234 msec
and for PyCall mymodule.get_bench2_results(100_000) = 1.811019 msec
so around 1.1x slower for PyCall.

It looks like the first one had serious dynamic memory allocation and maybe this creates some overhead when the python process is running inside from Julia ?
Ofc I don’t think it’s even close to a serious problem but it kind of made me curious to know if there is inherently some performance bottleneck by using PyCall outside the conversion offset.

All of that is happening in Python either way, so I’d be surprised if there were any difference at all.

Did you run the benchmarks multiple times to see if it’s just random noise? Running the same benchmark twice can give very different answers if your OS decides to do some background task (like updates or AV scanning).

Also check that PyCall is using the same Python interpreter as you are testing with.

Well I specified number=10_000 so it’s an average among 10,000 iterations.
Especially the first bench (appending to a list) I run it multiple times yes (although not somehow sophisticated rather just manually) and it was always this very little bit slower than running it directly with python.

In particular, PyCall.python will tell you which Python it corresponds to.

Yes, both use /bin/python3.
Can someone else reproduce my example results ?

With Python 3.7.11 on my laptop, I get in python:

>>> benches.get_bench_results(100_000)
8814138.673000002

and in julia:

julia> mymodule.get_bench_results(100_000)
8.997317473499993e6

which is within 2%.

1 Like

hmm.I just cheked in another machine with python 3.8.10, Julia 1.7.0-beta3, and PyCall v1.92.5.

For python

>>> benches.get_bench_results(100_000)
7073032.0084000025

and julia

>>> mymodule.get_bench_results(100_000)
9.398170638599994e6

which is around 32%

I also checked for Julia 1.6.1 and the results are the same.
Tests done on ubuntu 20 and debian 11.

This is disturbingly weird. :confused:

I wonder if python is linked with a different libc or something than julia, so that e.g. julia has a different memory allocator (malloc etcetera) that is affecting libpython when you are running it within Julia? (Your benchmark is mostly testing memory allocation and garbage collection.)

Do you know a way to check something like this ?

Perhaps @giordano 's DependencyWalker.jl may help?

You can just run ldd on julia and python3 to see what they link.