Minimizing PythonCall overhead with BenchmarkTools

It seems like option 4 is the best one when benchmarking from Julia. Always interpolate with the object method

Julia benchmark
julia> using BenchmarkTools, PythonCall

julia> @pyexec """
       class Fib:
           def __init__(self, n):
               self.n = n
       
           def meth(self):
               a, b = 0, 1
               for i in range(self.n):
                   a, b = b, a+b
               return a
       """ => Fib
Python: <class 'Fib'>

julia> obj_jl = Fib(10)
Python: <Fib object at 0x7fcb883f1f50>

julia> @btime $(obj_jl).meth();
  395.060 ns (4 allocations: 72 bytes)

julia> @btime $(obj_jl.meth)();
  279.845 ns (1 allocation: 16 bytes)

julia> @btime pycall($(obj_jl).meth);
  391.574 ns (4 allocations: 72 bytes)

julia> @btime pycall($(obj_jl.meth));
  278.197 ns (1 allocation: 16 bytes)

However, benchmarking from Python with timeit is faster:

Python benchmark
julia> using PythonCall

julia> @pyexec """
       import timeit
       import numpy as np
       
       class Fib:
           def __init__(self, n):
               self.n = n
       
           def meth(self):
               a, b = 0, 1
               for i in range(self.n):
                   a, b = b, a+b
               return a
       
       obj_py = Fib(10)
       times_py = np.array(timeit.repeat("obj_py.meth()", globals=locals(), number=100, repeat=1000)) / 100
       """ => times_py;

julia> minimum(times_py)  # result in seconds
Python: 2.6893000040217883e-07