Optimized Python is as good as Julia

At the risk of contributing to a thread that’s already a lot of people piling on, I think it’s helpful to explain when you can expect Python to have the same performance as Julia.

Broadly speaking, NumPy is implemented in C, which has roughly the same performance as Julia. Therefore, the performance of NumPy functions that are called from Python can be expected to be as fast as Julia code.
The run time of the code in you OP is dominated by a few function calls: shuffle, pairwise equality of two arrays, and a sum over a boolean array. That’s why you see little difference in speed.

However, there are two reasons why Julia usually is faster than Python anyway:

First, often you don’t have the luxury of needing to solve a problem that can be expressed using only a few function calls into NumPy. Often, you need to write loops in Python to solve your problems. Try, for example, to find the most common sub-sequence of length 4 in a long array. Then suddenly you are no longer simply calling into C code where all the computation happens - now suddenly a large part of the computation happens in Python, and then you begin to see Julia being 10 or 100 times faster.

Second, for most problems, if it’s truly performance sensitive, there are many small optimisations than can be sought to improve performance of your code. Since Python relies on calling a small set of fixed NumPy functions for speed, this is not generally possible in Python. In Julia however, since the code is “at your fingertips” - i.e. the heavy lifting happens right at the level of the code your write yourself - you can do whatever optimisations you want.

For example, in your OP example, the expression np.sum(hats == ideal) first allocates an array of bools and then sums it, in two passes. This could be more efficient. In Julia, you could for example write count(splat(==), zip(hats, ideal)) to do this in a single pass without allocations.
Also in Julia, there is no need to collect the ideal, since 1:n is a perfectly fine vector that can be used in broadcasting operations - unlike NumPy, which lacks the genericness to allow a Python range object to interact with NumPy.

That leaves about 90% of the time spent in shuffle! making it harder to optimise further - and for me, the Julia code now runs 7x faster than Python.

53 Likes