Why is this simple function twice as slow as its Python version

Semi · April 10, 2021, 6:58pm

I’m in the planing stages for writing a piece of scientific software and I would like to use Julia for it, but I’m surprised how much slower Julia is from Python when tested on this simple code:

function test(n::Int64,k::Int64,tmp2::Array{Float64,2})
    for i in 1:k
        t = rand(n, n) 
        for j in 1:k
            tmp1 = rand(n, n)
            tmp2[(i-1)*n+1:i*n,(j-1)*n+1:j*n] = t*tmp1
        end
    end
end;

The corresponding python code is

def test(n,k,tmp2):
    for i in range(k):
        t = rand(n, n) 
        for j in range(k):
            tmp1 = rand(n, n)
            tmp2[i*n:(i+1)*n,j*n:(j+1)*n] = t@tmp1

When tested with, for example,

n = 300
k = 30
tmp2=zeros((n*k,n*k))

Julia needs more than a second

@time test(n,k,tmp2)
1.122870 seconds (3.67 k allocations: 1.227 GiB, 15.44% gc time)
while python is 30% faster
%%timeit
test(n,k,tmp2)
853 ms ± 7.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

When testing the same function but with
tmp2[(i-1)*n+1:i*n,(j-1)*n+1:j*n] = t*tmp1
changed to some element-wise operation on the array, Julia is much faster than python, so I suppose this has to do with matrix operations.

I’m using Julia 1.4 on a Macbook pro. I tried switching from openblas to mkl but this did not improved the speed of Julia.

I would be thankful for any hint what I’m doing wrong, or Julia indeed is so slow when dealing with lots of matrix operations.

alastair-marshall · April 10, 2021, 7:03pm

Hey there!

Have you tried running the Julia code more than once? The first time Julia runs a function it’ll compile a fast version of it, you might have timed how long it takes to compile rather than just the function!

dfdx · April 10, 2021, 7:06pm

t*tmp1

In Python, it’s an elementwise multiplication, while in Julia it’s matrix-by-matrix multiplication.

Semi · April 10, 2021, 7:09pm

Good catch!

But with t@tmp1, python is still quite a bit faster
853 ms ± 7.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Semi · April 10, 2021, 7:10pm

Yes, I did that.

pdeffebach · April 10, 2021, 7:16pm

Your python code doesn’t run for me. Can you add the necessary imports and prefixes so I can check from copying and pasting?

raphbacher · April 10, 2021, 7:18pm

Hi, try switching the order of the loops i and j (python is row major vs Julia is column major).

Semi · April 10, 2021, 7:28pm

Sure,

import numpy
from numpy import zeros
from numpy.random import rand

Semi · April 10, 2021, 7:29pm

This did not make any difference.

Semi · April 10, 2021, 7:31pm

And please change t*tmp1 to t@tmp1, so that the comparison makes sense.

tbeason · April 10, 2021, 7:38pm

Stepping back a bit, I don’t think the thing that is slow here is “matrix operations”. You are creating lots and lots of somewhat large random matrices, as evidenced by the allocation numbers. If you just had 2 temporary arrays that you reused, you would see big speedups.

pdeffebach · April 10, 2021, 7:43pm

Thanks!

Interestingly, I don’t see the performance gain you mention.


In [8]: %timeit test(n,k,tmp2)
1.26 s ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

vs.

julia> @btime test($n,$k,$tmp2)
  1.178 s (3660 allocations: 1.23 GiB)

Semi · April 10, 2021, 7:45pm

It seems this is not the case.
For example

function test(n)
    k = 100
    t = rand(n, n)
    tmp1 = rand(n, n) 
    for i in 1:k
        tmp1 = t + t;
    end
    for j in 1:k
        tmp1 = tmp1*t;
    end
end

gives me

@btime test(100)
8.380 ms (404 allocations: 15.43 MiB)

while

import numpy
from numpy.random import rand
def test(n):
    k = 100
    t = rand(n, n)
    tmp1 = rand(n, n) 
    for i in range(k):
        tmp1 = t + t;
    for j in range(k):
        tmp1 = tmp1@t;

gives me

%timeit test(100)
2.38 ms ± 40.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Semi · April 10, 2021, 7:48pm

That’s very interesting!
What are your specs?
My colleague also tried this on a Windows machine and he obtained similar results as I did. I don’t know which version of Julia he used.

pdeffebach · April 10, 2021, 7:51pm

Here:

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_PKG_DEVDIR = /home/peterwd/Documents/Development
  JULIA_EDITOR = subl

oheil · April 10, 2021, 7:59pm

Can you check:

using LinearAlgebra
function test(n::Int64,k::Int64,tmp2::Array{Float64,2})
	buf=zeros(n,n)
    for i in 1:k
        t = rand(n, n) 
        for j in 1:k
            tmp1 = rand(n, n)
            tmp2[(i-1)*n+1:i*n,(j-1)*n+1:j*n] = mul!(buf, t, tmp1)
        end
    end
end
...

giordano · April 10, 2021, 8:04pm

Julia has a different memory layout than Python, you’re doing the for-loop in the unfavourable order for Julia: Performance Tips · The Julia Language

oheil · April 10, 2021, 8:19pm

This gives for me:

julia> @btime test($n,$k,$tmp2)
  866.969 ms (1862 allocations: 639.34 MiB)

But of course we have different systems and I refuse to install python now

oheil · April 10, 2021, 8:23pm

Switching off index checking gives a few ms more:

function test(n::Int64,k::Int64,tmp2::Array{Float64,2})
	buf=zeros(n,n)
    @inbounds for i in 1:k
        t = rand(n, n) 
        for j in 1:k
            tmp1 = rand(n, n)
			tmp2[(i-1)*n+1:i*n,(j-1)*n+1:j*n] = mul!(buf, t, tmp1)
        end
    end
end

julia> @btime test($n,$k,$tmp2)
  809.509 ms (1862 allocations: 639.34 MiB)

pdeffebach · April 10, 2021, 8:24pm

It’s very easy to figure out ways to speed up this function in Julia. But it remains an interesting, and unanswered, question why python is faster with “vanilla” code.

Topic		Replies	Views
Why is this code so slow in julia compared to a numpy implementation? Performance performance	9	3546	October 24, 2017
Why are calls to this function slow, compared to Python+Numpy? General Usage fast-math	14	2094	August 13, 2017
Why is python faster than Julia Performance	14	1859	March 12, 2020
Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] Performance question	37	10582	October 15, 2019
Slow matrix multiplication in Julia compared to Python numpy New to Julia question	17	5472	May 19, 2018

Why is this simple function twice as slow as its Python version

Related topics