Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now]

My python is rusty, but I thought r = m1*m0*np.array([1., 0.]) was scalar multiplication, identical to r = m1 .* m0 .* [1,0] in Julia. It calculates a 2x2 array r, and surely it does all the calculations, even the ones which give zero and are discarded. (And of course scalar .* gives the same results as matrix multiplication * on diagonal matrices.)

It’s a little hard to tell from this example which of these calculations you actually need done in the real case. If it’s just m1[1,1] * m0[1,1] then just do that, don’t make matrices.

Some morals to keep in mind:

  1. If you find yourself creating and operating on zillions of tiny arrays in a loop in Julia, consider using StaticArrays

  2. If you find yourself creating zillions of big arrays in a loop in Julia, consider working in-place.

  3. If you find yourself writing “vector” style code that applies simple functions (e.g. your f(p)) one by one to a sequence of big arrays in Julia, consider rewriting in a different style. A “scalar” style that does lots of operations on a single input is fast in Julia (unlike Python), and may be faster than “vector” style code because of memory locality and similar considerations.

7 Likes

yes, it’s scalar, but since the elements at those positions (0,0) and (1,1) are arrays and not a proper materialized diagonal matrix (not even in type), the upper left arrays of m0 and m1 end up being multiplied - I tried this on my laptop, but I’m currently on my phone so I can only show you later. Basically m1[0][0]*m0[0][0] is the same as the multiplication of t1 and t0 after the lambdas from above are applied.

That doesn’t mean that numpy isn’t calculating it though, just that because of the later r[0] everything else is discarded anyway and is thus not needed.

Woaw !
This is my first contact with julia community.
I love it.
So, problem solved: I have now something faster than numpy ( thanks to @davidbp )
And probably even faster with StaticArrays (haven’t try it yet) thanks to @stevengj

  • other improvements (thanks to all)

And even more: a bug detected in my python numpy code

I was trying julia, now I am moving to julia

Thanks !

10 Likes

Maybe ask in the numpy community and see if there can be improvement for a fair comparision?

4 Likes

Yes that’s what I said, mostly. Except that if the question is about speed, then calculating one multiplication, vs. scalar multiplication of two arrays, vs matrix multiplication of two arrays, these are very different things! Even if what you keep from them is the same number.

This is why having a sufficiently nontrivial example that the answer depends on such details would be useful. I still think the Python example skips matrix multiplication entirely, so it’s not a fair comparison:

>>> np.array([[1,2],[3,4]]) * np.array([[5,6],[7,8]])
array([[ 5, 12],
       [21, 32]])
julia> [1 2; 3 4] .* [5 6; 7 8]  # N^2 operations
2×2 Array{Int64,2}:
  5  12
 21  32

julia> [1 2; 3 4] * [5 6; 7 8]  # N^3 operations
2×2 Array{Int64,2}:
 19  22
 43  50

You mean the elements, surely. m0 is a 2x2 array of complex numbers.

There are many other reasons to move to julia. Same order of magnitude than numpy is enough to choose julia for the project.

How do you know that numpy can not be even faster?

[/quote]

How do you know that numpy can not be even faster?
[/quote]

Maybe numpy can be slightly faster for some operations but not for all, because:

  1. julia has jit
  2. julia’s ability to handle multicore
  3. (and gpu)

Numba

1 Like

Numba is very limited.
Pypy is great if you want python and do not need numpy.
Python3 is good for notebooks (a huge collections of libraries available).
For heavy computations Julia seems to be the best.
Confirmed here.

I would not state confirmed here. I actually don’t want to start another “Why not numba/cython/shedskin/pypy/tensorflow” argument in this thread.

You can make python with Cython fast but… you have to rewrite your code, type everything (event loop indicies) or the performance will degrade a lot. If you are in a situation where you only care about speeding up functions that get as input numpy arrays then cython is fine. Once you start working with dataframes, text, python lists that contain different types… then it’s another whole new story. You will have to learn C++ and how to use C++ data structures in cython. And obviously… compile your code and link external libraries etc…

If Python users end up in this discussion I will leave a link to some
cython_experiments.

4 Likes

Yes the elements, but they’re arrays!

>>> n= 10**6
>>> p = 2 * np.pi * np.random.rand(2,n)
>>> t0,t1 = p
>>> type(t0)
<class 'numpy.ndarray'>
>>> m0 = np.array([[cos(t0) - 1j*sin(t0), 0], [0, cos(t0) + 1j*sin(t0)]])
>>> type(m0)
<class 'numpy.ndarray'>
>>> m0
array([[array([ 0.89155991+0.45290277j, -0.89750168-0.44101105j,
       -0.90581293+0.42367786j, ...,  0.50766908+0.86155215j,
        0.93237733+0.36148652j,  0.97745514-0.2111432j ]),
        0],
       [0,
        array([ 0.89155991-0.45290277j, -0.89750168+0.44101105j,
       -0.90581293-0.42367786j, ...,  0.50766908-0.86155215j,
        0.93237733-0.36148652j,  0.97745514+0.2111432j ])]], dtype=object)

Notice the dtype being generic object, indicating that m0 holds a mixture of types.

And thus:

>>> m1 = np.array([[cos(t1) - 1j*sin(t1), 0], [0, cos(t1) + 1j*sin(t1)]])
>>> def z(x):
...     return cos(x) - 1*j*sin(x)
...
>>> (m1[0][0]*m0[0][0]==z(t1)*z(t0)).all()
True
>>> ((m1*m0)[0][0]==z(t1)*z(t0)).all()
True

To be clear here, I agree with you! It’s scalar multiplication all the way down, I’m just saying that because of that and the fact that the later r[0] basically ignores everthing but the upper left cell, there’s room for not even doing any other muls than those needed for the upper left cell.

You might want to take a look at GitHub - QuantumBFS/Yao.jl: Extensible, Efficient Quantum Algorithm Design for Humans.

And I would definitely use static arrays here as Steven suggested.

1 Like

OK, my bad, I mis-read the Python.

But I still think there is no matrix multiplication at all on the Python side here. Unless I’m still misreading things? This is confusing since everybody is comparing to Julia with matrix multiplications.

There is Pythran too. It seems to have threaded loop fusion which we dont have (yet)…

Yes you’re right, there’s no matmuls or even vecmuls here, just scalar multiplication. I’ve misread that in my first post up top as well, hence the misplaced mul! there.

What is that?