Question on simple performance comparison between Python and Julia

I ran the following for loops with Python and Julia respectively on the terminal by commands python test.py and julia test.jl, but it turned out that Python (≈ 37 sec) spent less time than Julia (≈ 52 sec). Why?

test.py:

for i in range(1,100001):
    print(i)

test.jl:

for i in 1:100000
    println(i)
end

Julia compiles efficient binaries on the first call of a function. So the problem with the benchmark is that the loop needs to be wrapped in a function for compilation to happen.

You should get much faster Julia code with:

function main(n)
    for i in 1:n
        println(i)
    end
end
main(10000)
5 Likes

To complement what’s been said

  • you need to run the function a first time so that it is performant the following times
  • calling julia test.jl includes the Julia startup time, which can be nonnegligible depending on your setup

Here is my recommendation to properly benchmark any Julia code: use BenchmarkTools.jl and do something like this:

using BenchmarkTools

function main(n)
    # do stuff
end

n = 10000
b = @belapsed main($n)
println(b)
2 Likes

This specific example is a known issue: Println performance bug in interactive terminal (I.e. much slower than c, perl, ruby, python) · Issue #43176 · JuliaLang/julia · GitHub

7 Likes

And as @johnmyleswhite said, what you’re really benchmarking here is the printing performance and not the for loop, which is probably not what you actually want?

2 Likes

Oh… thank you and everyone above! I’ll take some time to digest these information. :handshake:

2 Likes

Sorry, I’m back again. There are still some questions that have been bothering me.

  1. Must I run Julia code in that special manner to achieve a better performance? Can I say it “special”? I mean the workflow that wrapping the code in a function, pre-running it, and counting the time of calling it. What is the scientific basis for evaluating performance in this manner? Is this fair to Python? Should Python code also be evaluated using such a manner?
  2. I did another comparison, running the following two files through commands python test2.py and julia test2.jl on the terminal respectively, and found that Python (10~11 sec) took longer than Julia (7~8 sec) this time. Can I conclude from this that Julia performs better than Python only when the code is very time consuming?

test2.py:

total = 0
for i in range(1, 100000001):
    total += i
print(total)

test2.jl:

total = 0
for i in 1:100000000
    global total += i
end
println(total)

May I ask what does the “setup” you’ve mentioned specifically mean?

Thank you in advance for reading and responding!

  1. I, as a PhD student, have reported in scientific papers the timings for Julia code using the method mentioned, this is: run the code with some dummy input that triggers the compilation first, and then run with the real data and time it. As a compiled language, Julia will always be not entirely comparable to Python. How do you compare C and Python timings? Do you include the C compilation time or not? The difference Julia have from C is that is hard to completely separate the compilation phase from the runtime phase, it needs a dummy run that is guaranteed to call every function the real experiment would. It is up to the scientist to decide what is a fair experiment/comparison method for their experiment. One way to make the comparison “fair” for example, is to pass --compile=no to the Julia compiler/interpreter, so it becomes basically an interpreted language, however, then there is little reason to use Julia (which the advantage is exactly to be able to compile and have great performance past compilation because of that).
  2. Julia performs better than Python when the code in Julia does not load a lot of external libraries and instead calls multiple times the same functions with similar types in the arguments and these functions do a lot of computational effort. Why? Because this minimizes how much time is spent compiling code in relation to running the compiled/optimized code. The worst case for Julia is a code that loads a lot of complex libraries and just call many distinct functions that do little effort a single time each. Why? Because then Julia will have to compile loads of code from libraries just to run a lightweight computation a single time. For this kind of script it is better to either use a scripting/interpreted language, or to pass --compile=no.
5 Likes

From that specifically no.

The way to write performant code in Julia is writing functions and following some guidelines that ensure type-stability, being the most important one not using non-constant global variables.

But that is not the way to write performant python code either. Python, alone, cannot do better than that, but if you had to write an actual code in python aiming for performance you would use some library, like numpy, numba, etc, depending on the problem.

These toy examples hardly tell you anything. For instance:

julia> function f()
           total = 0
           for i in 1:100000000
               total += i
           end
           return total
       end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime f()
  2.672 ns (0 allocations: 0 bytes)
5000000050000000

As you can see, your time estimates on the order of seconds don’t mean anything in terms of performant code. And, in this case, a good compiler just simply returns the result:

julia> @code_llvm f()
;  @ REPL[1]:1 within `f`
define i64 @julia_f_111() #0 {
top:
;  @ REPL[1]:6 within `f`
  ret i64 5000000050000000
}
3 Likes

Thank you very much! Your response has made it more clear for me. :handshake:

1 Like

Thank you very much! Your responses are always professional and helpful! :handshake:

1 Like

Some toy examples may give an idea for the performance difference. The standard Normal distribution is defined as

f(x) = \frac{1}{\sqrt{2 \pi}} \exp{(-0.5 x^2)}

for -\infty < x < \infty. Lets implement a naive numerical integration (approximation) in Python like

import math 

def f(x):
    return (1/math.sqrt(2 * math.pi)) * math.exp(-0.5 * math.pow(x, 2))


def integrate(func, f, t, e = 0.0000001):
    sum = 0.0
    x = f
    while x <= t:
        sum += func(x) * e
        x += e
    return sum


print(integrate(f, -1, 1))

and here is a nearly same Julia implementation

function f(x)
    return (1/sqrt(2 * pi)) * exp(-0.5 * x^2) 
end 

function integrate(func, from, to; e = 0.0000001)
    sum = 0.0
    x = from
    while x <= to
        sum += func(x) * e
        x += e
    end 
    return sum
end 

print(integrate(f, -1, 1))

Even if the compilation takes time in Julia implementation, the results in terminal are like:

$ time julia 1.jl
0.6826895163607789
________________________________________________________
Executed in  808.25 millis    fish           external
   usr time  758.27 millis  283.00 micros  757.99 millis
   sys time  217.83 millis  745.00 micros  217.08 millis

$ time python3 1.py
0.6826895163607789

________________________________________________________
Executed in    9.02 secs    fish           external
   usr time    8.96 secs  280.00 micros    8.96 secs
   sys time    0.03 secs  748.00 micros    0.03 secs

Julia is generally better when the implementation consists on comprehensive loops. Yes, this is a toy example but there is not any compiler tricks, returning constants, etc.

5 Likes

Thanks! This is exactly what I am looking for!

Note that @jbytecode code does follow all the “peformance tips” mentioned above. That doesn’t make the code complicated. But there are some rules there which guarantee that the code is performant from the point of view of a Julia implementation.

(and it can be prettier than Python’s if the first function is just written as

f(x) = (1/sqrt(2 * pi)) * exp(-0.5 * x^2)

or (that’s almost a joke, but it actually works):

f(x) = exp(-x^2/2) / √(2π)

:slight_smile: )

2 Likes

Thanks! I will take a careful look at that. :saluting_face:

Yes, at least if you decide that compilation time is not relevant for your use case. I’ll try to give another reason why that is fair.

When we care about performance, it is usually for complex functions which run for longer than, say, a minute. In that case, the compilation time is negligible anyway (there is some Julia code that takes longer than that to compile, but since 1.9 not that much).

However, these complex functions are made of several small components, which they call many times (for instance inside a loop). Of course we want to benchmark the components, to make the best of each of them. But is it realistic to include compilation for each individual run of these small components? No, because they will be compiled once and run many times. In fact, they can even be compiled on a small output (component(n=2)) and run on a larger one (component(n=10000)).
And as a matter of fact, BenchmarkTools.jl does exactly what we want: it runs the function several times and returns the minimum duration.

TLDR:

  • For complex functions that are called 1 time, compilation time is negligible compared to runtime
  • For small components that are called N times, compilation time is negligible compared to N x runtime
2 Likes

For instance the time it takes to launch Julia itself, and the time you spend loading packages with using ... statements. That is why the typical Julia workflow is to open a REPL in the morning, load all the packages, and only kill it in the evening.
At least that used to be the case before version 1.9 of the language. Now startup is much much faster, so this is less of a problem.

1 Like

I think either x*x or x**2 would be faster, and considered more idiomatic.

1 Like

sure, and it is analogous to Julia’s Base.:^(x, 2).

Funnily enough this probably isn’t true. math.pow in python will sometimes be faster than x*x because multiplication has to go through a more complicated runtime dispatch because python has a special double dispatch for addition and multiplication (__mul__ and __rmul__).

2 Likes