Slow matrix multiplication in Julia compared to Python numpy

StefanKarpinski · May 19, 2018, 6:16pm

Unless I’m mistaken, x*y is elementwise multiplication in Python. Timing things in global scope in Julia is not a good way to get an accurate estimate of realistic runtimes. The right way to do it is this:

julia> using BenchmarkTools

julia> a = rand(1000, 1000);

julia> @benchmark $a .* $a # elementwise
BenchmarkTools.Trial:
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     2.545 ms (0.00% GC)
  median time:      3.824 ms (0.00% GC)
  mean time:        4.986 ms (26.48% GC)
  maximum time:     78.045 ms (95.63% GC)
  --------------
  samples:          999
  evals/sample:     1

julia> @benchmark $a * $a # matmul
BenchmarkTools.Trial:
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     51.575 ms (0.00% GC)
  median time:      57.945 ms (0.00% GC)
  mean time:        60.352 ms (3.93% GC)
  maximum time:     142.708 ms (58.50% GC)
  --------------
  samples:          83
  evals/sample:     1

On the same system, this is what I see for the same operations in Python:

>>> import numpy
>>> import cProfile
>>> a = numpy.random.random((1000, 1000))
>>> cProfile.run("a * a")
         2 function calls in 0.010 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.010    0.010    0.010    0.010 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


>>> cProfile.run("a.dot(a)")
         3 function calls in 0.055 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.055    0.055 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.054    0.054    0.054    0.054 {method 'dot' of 'numpy.ndarray' objects}

So matrix multiplication seems to be the same in Julia and NumPy whereas Julia is significantly faster (4x) at elementwise multiplication. I’m not sure why, perhaps the operation is cheap enough that the overhead of calling out to C is significant in Python (and doesn’t exist in Julia). Edit: no, I tried a 10,000^2 matrix and the relative performance didn’t improve much—3x instead of 4x.

Topic		Replies	Views
Why is this simple function twice as slow as its Python version Performance question	97	4588	April 12, 2021
Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] Performance question	37	11079	October 15, 2019
Benchmarking Julia vs NumPy New to Julia benchmark	9	1726	March 29, 2020
Why is this code so slow in julia compared to a numpy implementation? Performance performance	9	3604	October 24, 2017
Benchmark MATLAB & Julia for Matrix Operations Performance	148	20002	October 15, 2019

Slow matrix multiplication in Julia compared to Python numpy

Related topics