NumPy is faster than Julia 1.11 for maximum function

mbauman · July 3, 2025, 3:09pm

Nope! C ain’t an insurmountable barrier like the speed of light is. It’s not necessarily able to natively express the fastest possible implementation for some algorithms. But, similarly, Julia might not be able to, either.

They’re all just languages that are trying to give you the ability to express (and then compile to) the fastest set of instructions for a given architecture with varying levels of success.

jling · July 3, 2025, 3:12pm

In [2]: a = np.random.rand(512000)
In [3]: %timeit np.min(a)
44.6 µs ± 853 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops
each)

julia> @be rand(512000) fast_minimum samples=100 evals=50
 mean   38.949 μs

Julia is actually faster if you enable fastmath like Numpy does.

giordano · July 3, 2025, 3:19pm

For fun, this is the kernel internally used by Julia on v1.13 (and soon v1.12 too):

github.com/JuliaLang/julia

base/reduce.jl

7a5c4b521


      
          @noinline function mapreduce_impl(f, op, A::AbstractArrayOrBroadcasted,
                                            ifirst::Integer, ilast::Integer, blksize::Int)
              if ifirst == ilast
                  @inbounds a1 = A[ifirst]
                  return mapreduce_first(f, op, a1)
              elseif ilast - ifirst < blksize
                  # sequential portion
                  @inbounds a1 = A[ifirst]
                  @inbounds a2 = A[ifirst+1]
                  v = op(f(a1), f(a2))
                  @simd for i = ifirst + 2 : ilast
                      @inbounds ai = A[i]
                      v = op(v, f(ai))
                  end
                  return v
              else
                  # pairwise portion
                  imid = ifirst + (ilast - ifirst) >> 1
                  v1 = mapreduce_impl(f, op, A, ifirst, imid, blksize)
                  v2 = mapreduce_impl(f, op, A, imid+1, ilast, blksize)

This file has been truncated. show original

For all reductions, not just min/max. BTW, @mbauman Remove bugged and typically slower `minimum`/`maximum` method by mbauman · Pull Request #58267 · JuliaLang/julia · GitHub should have deleted

github.com/JuliaLang/julia

base/reduce.jl

7a5c4b521


      
          # certain `op` (e.g. `min` and `max`) may have their own specialized versions.

right?

mbauman · July 3, 2025, 3:24pm

They still may!

I’m still trying to figure out how to better express this implementation in a way that’s both more generic and more easily (and perhaps even documentedly?) specializable. This may yet change in bigger ways.

jling · July 3, 2025, 3:31pm

testing against Julia nightly:

In [2]: import numpy as np
   ...:
   ...: A = np.random.rand(2026,51,251)
   ...: %timeit np.max(A)
3.71 ms ± 32.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

julia> @btime maximum(A)
  4.789 ms (0 allocations: 0 bytes)
0.9999999129587941

julia> @btime @fastmath maximum(A)
  3.531 ms (0 allocations: 0 bytes)
0.9999999129587941

Joris_Pinkse · July 4, 2025, 2:02am

On my architecture (AMD), I see no improvement going from 11.5 to today’s nightly, nor of any of the alternatives suggested above. Wonder why.

Topic		Replies	Views
Vector addition in Julia slower than numpy in Linux Performance	21	1288	May 1, 2020
Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] Performance question	37	10939	October 15, 2019
Why is this code so slow in julia compared to a numpy implementation? Performance performance	9	3563	October 24, 2017
Benchmarking Julia vs NumPy New to Julia benchmark	9	1669	March 29, 2020
Why is python faster than Julia Performance	14	1912	March 12, 2020

NumPy is faster than Julia 1.11 for maximum function

Related topics