Why is this code so slow in julia compared to a numpy implementation?

performance

#1

I have implemented some mathematical models in Python that require basic operation like scalar multiplication, summing, raising power, etc. of arrays (upto 3d) and scalars.

I find that my Julia code is substantially slower than my numpy code:
As an example:

Python(3.5)/Numpy:

import numpy as np
a=np.random.rand(3,3,50)
b=np.random.rand(3,3)
%timeit for i in range(a.shape[-1]):(5*a[0,:,i]+a[1,:,i]/2.0+4.0)

1000 loops, best of 3: 194 µs per loop

Julia:

using TimeIt
a=rand(3,3,50)
b=rand(3,3)
function test();@timeit for i in (1:size(a,3));(5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0);end;end;test();test();

100 loops, best of 3: 1.21 ms per loop
100 loops, best of 3: 1.12 ms per loop

So Julia is about 6 times slower.

I also often need to raise Float64 or Arrays{Float64} to a power:

Numpy:

%timeit for i in range(a.shape[-1]):(5*a[0,:,i]+a[1,:,i]/2.0+4.0)**3.3
1000 loops, best of 3: 263 µs per loop

Julia:

function test();@timeit for i in (1:size(a,3));(5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0).^3.3;end;end;test();test();
100 loops, best of 3: 2.07 ms per loop
100 loops, best of 3: 1.81 ms per loop

Now Julia is almost 7 times slower.

Even using custom power function as defined below doesn’t help:

pw(a,n) = @. exp(n*log(a))
function test();@timeit for i in (1:size(a,3));pw((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();

100 loops, best of 3: 1.95 ms per loop
100 loops, best of 3: 1.82 ms per loop

Defining a more complicated function like below does:

function pw2(x::Union{Array{Int64},Array{Float64}},n::Union{Int64,Float64})
    if (floor(n) - n) .!= 0.0
       @. exp(n*log(x))
    else
        n = convert(Int64,n)
        if n == 0
            return (ndims(x) == 0) ? 1.0 : ones(size(x))
        else
            xp = copy(x)
            for i in 1:(abs(n)-1)
                xp .*= x
            end
            return (n>0) ? xp : 1./xp
        end
    end
end

function test();@timeit for i in (1:size(a,3));pw2((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
100 loops, best of 3: 1.32 ms per loop
100 loops, best of 3: 1.18 ms per loop

Although this is better, this is still 4.5 times slower than a simple numpy implementation. I could probably write functions like this for addition, subtraction, etc., but when I also want to make these methods general so that they can take n::Array{Float64} I’m suddenly faced with a whole lot of extra method definitions for what seem like basic mathematical operations. Do I really have to write some sort of basic library for these basic operations to be able to make use of Julia’s speed? Writing all my functions for my project using such basic loops is not an option as this would the code completely unreadable and complex. Writing them for the operators might be, but I find it strange that this wouldn’t have been done yet in Julia. So I’m probably missing something here. Can anyone point me in the right direction? Thanks


#2

Please read the performance tips in the Julia manual, in particular don’t use global variables, and wrap everything in a function.


#3

The difference between results for pw and pw2 is probaby due to the types being specified in the latter:

function pw3(x::Union{Array{Int64},Array{Float64}},n::Union{Int64,Float64});return @. exp(n*log(x));end
function test();@timeit for i in (1:size(a,3));pw3((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
100 loops, best of 3: 1.31 ms per loop
100 loops, best of 3: 1.16 ms per loop

pw2 does give better performance (about 1.7x) than pw3:

function test();@timeit for i in (1:size(a,3));pw2(a[:,:,i],3);end;end;test();test();
10000 loops, best of 3: 26.78 µs per loop
10000 loops, best of 3: 25.64 µs per loop

function test();@timeit for i in (1:size(a,3));pw3(a[:,:,i],3);end;end;test();test();
10000 loops, best of 3: 45.41 µs per loop
10000 loops, best of 3: 43.58 µs per loop

But the gain is probably lost in the time it takes to do the other operations.


#4

The first thing is to read through the Performance Tips section of the manual.

The main general suggestions are

  1. never to benchmark in global scope (because then all your variables are global variables and can change at any time, so the compiler can’t generate optimized code)
  2. use @code_warntype to check for any places where Julia couldn’t infer your types statically (at compile-time), which is often the reason for slow performance.

Also, your code is much easier to read if you wrap it in back-ticks so it shows up highlighted and code-formatted, and when indented properly will look quite nice. The easier you make it on the reader the more likely you are to get more detailed feedback and help.


#5

It’s also not clear to me weather TimeIt.jl functions will count compile time. Using Benchmarks.jl is strongly recommended for this sort of thing.


#6

As the others said, read the performance tip, don’t use global variables, use BenchmarkTools. Your first example becomes

using BenchmarkTools
a=rand(3,3,50);
function test(a)
    for i in (1:size(a,3))
        (5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0)
    end
end
@belapsed test($a) #118 us

and use triple back ticks to set off code section.


#7

Thanks for the tip. I had read the performance tips, which is why I wrote the @time … in a function, but I had completely forgotten to put the variables a and b also in there:

function test();a=rand(3,3,50);b=rand(3,3);@timeit for i in (1:size(a,3));pw3((5.0.*a[1,:,i]+a[1,:,i]./2.0+4.0),3.3);end;end;test();test();
10000 loops, best of 3: 39.01 µs per loop
10000 loops, best of 3: 37.60 µs per loop

which is a 32x speed-up compared to when I tested the code with global a and b.


#8

I’m testing the performance of several functions one at the time with one taking the output of the other as input. Is there a way to test this in REPL? Or do I need to write a wrapper function around my entire code, so even around the functions generating the input for the function I actually want to test the performance of?
Thanks for the many a quick replies!


#9
  • Julia is column major so prefer slicing on the first dimension.
  • Numpy slices gives you views, use @views in julia.
  • Dot the expression if you want it to be element by element, e.g. with @.
  • Don’t use global variables.
using TimeIt

function test(a, b)
    for i in 1:size(a,3)
        @views @. 5.0 * a[:,1,i] + a[:,1,i] / 2.0 + 4.0
    end
end

a = rand(3, 3, 50)
b = rand(3, 3)

julia> @timeit test(a, b)
100000 loops, best of 3: 3.36 µs per loop

Also, please code quote your code and format it so it is readable.


#10

Like this?

geta(x) = x+4
getb(x) = x-4
useab(a,b) = a*b
x=2
@belapsed useab(geta($x), getb($x))