PyCall minimal overhead

Hi,
I work on a Julia benchmark Data Base project aiming to compare different implementations of basic computational kernels implemented in various languages.

I wonder about the most efficient way to call a Python snippet from Julia. The following MWE

using PyCall
using BenchmarkTools

#I would like to be able to read the python snippet from a file like
# py_snippet=read("axpy.py")
#but for now I follow the PyCall example
py"""
def pyaxpy(y,x,a):
    y+=a*x
"""
function measure(n)
    @show n
    x,y,a=(rand(n),rand(n),1/3)

    @btime py"pyaxpy"($y,$x,$a)
    px,py,pa=map(PyObject,(x,y,a))
    @btime py"pyaxpy"($py,$px,$pa)
end

foreach(measure,(10^i for i in (1:4)))

returns the following results on my machine:

n = 10
  14.097 μs (23 allocations: 1.05 KiB)
  12.548 μs (11 allocations: 400 bytes)
n = 100
  14.210 μs (23 allocations: 1.05 KiB)
  12.696 μs (11 allocations: 400 bytes)
n = 1000
  15.854 μs (23 allocations: 1.05 KiB)
  14.438 μs (11 allocations: 400 bytes)
n = 10000
  23.135 μs (23 allocations: 1.05 KiB)
  22.240 μs (11 allocations: 400 bytes)
  • It appears that there is a minimal overhead of about 12 microseconds on a Python call.
  • It also appear that I can save 1 microsecond by pre-PyObjecting function arguments.
  • When I use Python perfplot, it seems that the minimal timings are around 1 microsecond (and not 10).

My questions are:

  1. Is there a way to reduce this overhead from Julia?
  2. Could I store my Python snippet in a separated file ?

Thank you for your help :wink:

1 Like

I saved much time by assigning the Python function to a Julia name.

@btime py"pyaxpy"($y,$x,$a) # n=10

11.699 μs (23 allocations: 1.05 KiB)

pyaxpy = py"pyaxpy"
@btime pyaxpy($y,$x,$a)

4.614 μs (18 allocations: 896 bytes)

Combining with converting the inputs beforehand to PyObjects:

px,py,pa=map(PyObject,(x,y,a))
@btime pyaxpy($py,$px,$pa)

3.563 μs (6 allocations: 224 bytes)

in Python:

%timeit pyaxpy(a,b,c)

1.5 µs ± 9.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Thus, the PyCall function call overhead is approximately 2 μs.

3 Likes

You can also save time by suppressing the output conversion: pycall(myfunc, PyObject, args...).

1 Like

Including @stevengj 's suggestion:

@btime pycall(pyaxpy, PyObject, $py,$px,$pa)

2.244 μs (4 allocations: 112 bytes)
Only <1 µs overhead remains.

Thanx a lot,
I modified the MWE accordingly (with pre-PyObjecting)

using PyCall
using BenchmarkTools

#I would like to be able to read the python snippet from a file like
# py_snippet=read("axpy.py")
#but for now I follow the PyCall example
py"""
def pyaxpy(y,x,a):
    y+=a*x
"""

pyaxpy = py"pyaxpy"

function measure(n)
    @show n
    x,y,a=(rand(n),rand(n),1/3)

    @btime py"pyaxpy"($y,$x,$a)
    px,py,pa=map(PyObject,(x,y,a))
    @btime py"pyaxpy"($py,$px,$pa)
    @btime $pyaxpy($py,$px,$pa)
end

foreach(measure,(10^i for i in (1:4)))

And now I get:

n = 10
  14.030 μs (23 allocations: 1.05 KiB)
  12.486 μs (11 allocations: 400 bytes)
  3.150 μs (4 allocations: 176 bytes)
n = 100
  13.967 μs (23 allocations: 1.05 KiB)
  12.635 μs (11 allocations: 400 bytes)
  3.298 μs (4 allocations: 176 bytes)
n = 1000
  15.655 μs (23 allocations: 1.05 KiB)
  14.202 μs (11 allocations: 400 bytes)
  4.419 μs (4 allocations: 176 bytes)
n = 10000
  25.840 μs (23 allocations: 1.05 KiB)
  22.458 μs (11 allocations: 400 bytes)
  12.384 μs (4 allocations: 176 bytes)

Wow, thank you very much @stevengj and @lungben !!

I must confess that I was not able to use the @stevengj suggestion (pycall(myfunc, PyObject, args...)) without an example :relaxed:

A minimal working example is in my 2nd post above.
Hope this helps!

@btime pycall(pyaxpy, PyObject, $py,$px,$pa)

Yes, your example was necessary (for me). The final MWE

using PyCall
using BenchmarkTools

#I would like to be able to read the python snippet from a file like
# py_snippet=read("axpy.py")
#but for now I follow the PyCall example
py"""
def pyaxpy(y,x,a):
    y+=a*x
"""

pyaxpy = py"pyaxpy"

function measure(n)
    @show n
    x,y,a=(rand(n),rand(n),1/3)

    @btime py"pyaxpy"($y,$x,$a)
    px,py,pa=map(PyObject,(x,y,a))
    @btime py"pyaxpy"($py,$px,$pa)
    @btime $pyaxpy($py,$px,$pa)
    @btime pycall($pyaxpy, PyObject, $py,$px,$pa)
end

foreach(measure,(10^i for i in (1:4)))

and results:

n = 10
  14.007 μs (23 allocations: 1.05 KiB)
  12.474 μs (11 allocations: 400 bytes)
  3.172 μs (4 allocations: 176 bytes)
  1.526 μs (2 allocations: 48 bytes)

Thank you very much.
Do you know how to include the Python snippet from a file ?

Assume that your Python code is in my_module.py in the current directory, the following should work:

ENV["PYTHONPATH"] = "." # to enforce Python looking in current directory for imports
my_module = pyimport("my_module")
my_module.pyaxpy(y,x,a)

Edit: it is better to replace the ENV command by the following line to preserve potentially existing entries in PYTHONPATH:

pushfirst!(PyVector(pyimport("sys")."path"), "")

Source: GitHub - JuliaPy/PyCall.jl: Package to call Python functions from the Julia language

1 Like

Thank you again !

You would think this would be easy, even in Python (but it differed by version, here for Python 3).

It was surprisingly obscure, and at work, I ended up doing this way (may not be best way, and as I’m rewriting in Julia anyway, so will not investigate, but please tell me if you find a better way):

using PyCall

py"""
filename="app.py"
with open(filename, "rb") as source_file:
    code = compile(source_file.read(), filename, "exec")
exec(code)

app.run_server(host='localhost', port=8050)
"""

[I was calling a Python program using Dash, i.e. JavaScript in a web browser, but will be using GitHub - plotly/Dash.jl: Dash for Julia - A Julia interface to the Dash ecosystem for creating analytic web applications in Julia. No JavaScript required.]

2 Likes

For this, how about using the $$ interpolation mechanism built in @py_str?

help?> @py_str
  py".....python code....."

  Evaluate the given Python code string in the main Python module.

  If the string is a single line (no newlines), then the Python
  expression is evaluated and the result is returned. If the string
  is multiple lines (contains a newline), then the Python code is
  compiled and evaluated in the __main__ Python module and nothing
  is returned.

  If the o option is appended to the string, as in py"..."o, then
  the return value is an unconverted PyObject; otherwise, it is
  automatically converted to a native Julia type if possible.

  Any $var or $(expr) expressions that appear in the Python code
  (except in comments or string literals) are evaluated in Julia
  and passed to Python via auto-generated global variables.
  This allows you to "interpolate" Julia values into Python code.

  Similarly, ny $$var or $$(expr) expressions in the Python code
  are evaluated in Julia, converted to strings via string, and are
  pasted into the Python code. This allows you to evaluate code
  where the code itself is generated by a Julia expression.

For example:

julia> using PyCall

       # could as well be:
       #   pycode = read("myfile.py", String)
julia> pycode = """
       def hello():
           print("Hello from python!")
       """
"def hello():\n    print(\"Hello from python!\")\n"

julia> py"""
       $$pycode
       """

julia> py"hello()"
Hello from python!
4 Likes

Thank you both !

Would it be possible to directly call timeit from inside julia ? this way the evaluation will be done completely in python without considering the object pass overhead.
something like:

timeit = pyimport("timeit")
timeit.timeit("benchmark_function()", globals=locals())

At the moment, it’s not running because locals() doesn’t exist in julia the way I experience it.

Is there a work around ?

edit:

Okey so I tested it without the namespace being relevant:
(appending 1000 elements in a list)

timeit.timeit("""
times=1000
a = []
for i in range(1,times+1):
    a.append(i)
""", number=10000)/10000*1000000000 #nanoseconds

but running it directly in python is faster by 10 us.
Don’t know why though, since I thought timeit should start measuring by the moment the transition to the python environment is complete. :confused:

1 Like

Sorry for unearthing this thread but I have a related question for PythonCall

@filchristou do you have the answer to your question by now?

In the end I ended up creating a temp python module with all the benchmarks as functions. I also had a python wrapper function that would use timeit to benchmark the functions of interest.
Later i was calling the python wrapper function from a Pluto notebook after importing my python local module using PyCall.jl

1 Like