Why is this code so slow in julia compared to a numpy implementation?

ksmet1977 · October 24, 2017, 4:20pm

I have implemented some mathematical models in Python that require basic operation like scalar multiplication, summing, raising power, etc. of arrays (upto 3d) and scalars.

I find that my Julia code is substantially slower than my numpy code:
As an example:

Python(3.5)/Numpy:

import numpy as np
a=np.random.rand(3,3,50)
b=np.random.rand(3,3)
%timeit for i in range(a.shape[-1]):(5*a[0,:,i]+a[1,:,i]/2.0+4.0)

1000 loops, best of 3: 194 µs per loop

Julia:

using TimeIt
a=rand(3,3,50)
b=rand(3,3)
function test();@timeit for i in (1:size(a,3));(5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0);end;end;test();test();

100 loops, best of 3: 1.21 ms per loop
100 loops, best of 3: 1.12 ms per loop

So Julia is about 6 times slower.

I also often need to raise Float64 or Arrays{Float64} to a power:

Numpy:

%timeit for i in range(a.shape[-1]):(5*a[0,:,i]+a[1,:,i]/2.0+4.0)**3.3
1000 loops, best of 3: 263 µs per loop

Julia:

function test();@timeit for i in (1:size(a,3));(5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0).^3.3;end;end;test();test();
100 loops, best of 3: 2.07 ms per loop
100 loops, best of 3: 1.81 ms per loop

Now Julia is almost 7 times slower.

Even using custom power function as defined below doesn’t help:

pw(a,n) = @. exp(n*log(a))
function test();@timeit for i in (1:size(a,3));pw((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();

100 loops, best of 3: 1.95 ms per loop
100 loops, best of 3: 1.82 ms per loop

Defining a more complicated function like below does:

function pw2(x::Union{Array{Int64},Array{Float64}},n::Union{Int64,Float64})
    if (floor(n) - n) .!= 0.0
       @. exp(n*log(x))
    else
        n = convert(Int64,n)
        if n == 0
            return (ndims(x) == 0) ? 1.0 : ones(size(x))
        else
            xp = copy(x)
            for i in 1:(abs(n)-1)
                xp .*= x
            end
            return (n>0) ? xp : 1./xp
        end
    end
end

function test();@timeit for i in (1:size(a,3));pw2((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
100 loops, best of 3: 1.32 ms per loop
100 loops, best of 3: 1.18 ms per loop

Although this is better, this is still 4.5 times slower than a simple numpy implementation. I could probably write functions like this for addition, subtraction, etc., but when I also want to make these methods general so that they can take n::Array{Float64} I’m suddenly faced with a whole lot of extra method definitions for what seem like basic mathematical operations. Do I really have to write some sort of basic library for these basic operations to be able to make use of Julia’s speed? Writing all my functions for my project using such basic loops is not an option as this would the code completely unreadable and complex. Writing them for the operators might be, but I find it strange that this wouldn’t have been done yet in Julia. So I’m probably missing something here. Can anyone point me in the right direction? Thanks

dpsanders · October 24, 2017, 4:34pm

Please read the performance tips in the Julia manual, in particular don’t use global variables, and wrap everything in a function.

ksmet1977 · October 24, 2017, 4:34pm

The difference between results for pw and pw2 is probaby due to the types being specified in the latter:

function pw3(x::Union{Array{Int64},Array{Float64}},n::Union{Int64,Float64});return @. exp(n*log(x));end
function test();@timeit for i in (1:size(a,3));pw3((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
100 loops, best of 3: 1.31 ms per loop
100 loops, best of 3: 1.16 ms per loop

pw2 does give better performance (about 1.7x) than pw3:

function test();@timeit for i in (1:size(a,3));pw2(a[:,:,i],3);end;end;test();test();
10000 loops, best of 3: 26.78 µs per loop
10000 loops, best of 3: 25.64 µs per loop

function test();@timeit for i in (1:size(a,3));pw3(a[:,:,i],3);end;end;test();test();
10000 loops, best of 3: 45.41 µs per loop
10000 loops, best of 3: 43.58 µs per loop

But the gain is probably lost in the time it takes to do the other operations.

ssfrr · October 24, 2017, 4:34pm

The first thing is to read through the Performance Tips section of the manual.

The main general suggestions are

never to benchmark in global scope (because then all your variables are global variables and can change at any time, so the compiler can’t generate optimized code)
use @code_warntype to check for any places where Julia couldn’t infer your types statically (at compile-time), which is often the reason for slow performance.

Also, your code is much easier to read if you wrap it in back-ticks so it shows up highlighted and code-formatted, and when indented properly will look quite nice. The easier you make it on the reader the more likely you are to get more detailed feedback and help.

ExpandingMan · October 24, 2017, 4:38pm

It’s also not clear to me weather TimeIt.jl functions will count compile time. Using Benchmarks.jl is strongly recommended for this sort of thing.

ggggggggg · October 24, 2017, 4:40pm

As the others said, read the performance tip, don’t use global variables, use BenchmarkTools. Your first example becomes

using BenchmarkTools
a=rand(3,3,50);
function test(a)
    for i in (1:size(a,3))
        (5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0)
    end
end
@belapsed test($a) #118 us

and use triple back ticks to set off code section.

ksmet1977 · October 24, 2017, 4:40pm

Thanks for the tip. I had read the performance tips, which is why I wrote the @time … in a function, but I had completely forgotten to put the variables a and b also in there:

function test();a=rand(3,3,50);b=rand(3,3);@timeit for i in (1:size(a,3));pw3((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
10000 loops, best of 3: 39.01 µs per loop
10000 loops, best of 3: 37.60 µs per loop

which is a 32x speed-up compared to when I tested the code with global a and b.

ksmet1977 · October 24, 2017, 4:45pm

I’m testing the performance of several functions one at the time with one taking the output of the other as input. Is there a way to test this in REPL? Or do I need to write a wrapper function around my entire code, so even around the functions generating the input for the function I actually want to test the performance of?
Thanks for the many a quick replies!

kristoffer.carlsson · October 24, 2017, 4:45pm

Julia is column major so prefer slicing on the first dimension.
Numpy slices gives you views, use @views in julia.
Dot the expression if you want it to be element by element, e.g. with @.
Don’t use global variables.

using TimeIt

function test(a, b)
    for i in 1:size(a,3)
        @views @. 5.0 * a[:,1,i] + a[:,1,i] / 2.0 + 4.0
    end
end

a = rand(3, 3, 50)
b = rand(3, 3)

julia> @timeit test(a, b)
100000 loops, best of 3: 3.36 µs per loop

Also, please code quote your code and format it so it is readable.

ggggggggg · October 24, 2017, 5:11pm

Like this?

geta(x) = x+4
getb(x) = x-4
useab(a,b) = a*b
x=2
@belapsed useab(geta($x), getb($x))

Topic		Replies	Views
Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] Performance question	37	10992	October 15, 2019
General questions from Python user Performance	59	4324	March 8, 2021
Slow matrix multiplication in Julia compared to Python numpy New to Julia question	17	5552	May 19, 2018
Optimized Python is as good as Julia Performance question	28	25356	June 2, 2025
Vector addition in Julia slower than numpy in Linux Performance	21	1297	May 1, 2020

Why is this code so slow in julia compared to a numpy implementation?

Related topics