Comparing Python, Julia, and C++

Actually your python’s functions look like:

def f(r, x1, x2, x3, x4, x5, x6, x7, x8):
    r = x1 + x   

which very probably doesn’t do what is intended. (it just rebind local variable r inside function)

But it seems you could fix that and it bring no big impact to benchmarks:

%timeit l(result, *x)
11.5 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

def ll(x1, x2, x3, x4, x5, x6, x7, x8):
    return x1 + x2 - x3 + x4 - x5 + x6 - x7 + x8

%timeit r = ll(*x)
11.5 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For comparison numba version:

@numba.jit
def l2(x1, x2, x3, x4, x5, x6, x7, x8):
    return x1 + x2 - x3 + x4 - x5 + x6 - x7 + x8

%timeit r = l2(*x)
5.52 ms ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I was trying to add parallelism but maybe I have too few (2 :wink: ) cores now

os.environ["NUMBA_DEBUG_ARRAY_OPT_STATS"]='1'

@numba.jit('double[:](double[:],double[:],double[:],double[:],double[:],double[:],double[:],double[:],)', nopython=True, parallel=True)
def l2p(x1, x2, x3, x4, x5, x6, x7, x8):
    return x1 + x2 - x3 + x4 - x5 + x6 - x7 + x8
Parallel for-loop #23 is produced from pattern '('arrayexpr (((((((_+_)-_)+_)-_)+_)-_)+_)',)' at <ipython-input-125-c699c2d45b59> (3)
After fusion, function l2p has 1 parallel for-loop(s) #{23}.

%timeit r = l2p(*x)
5.51 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)