I ran the following for loops with Python and Julia respectively on the terminal by commands python test.py and julia test.jl, but it turned out that Python (≈ 37 sec) spent less time than Julia (≈ 52 sec). Why?
Julia compiles efficient binaries on the first call of a function. So the problem with the benchmark is that the loop needs to be wrapped in a function for compilation to happen.
You should get much faster Julia code with:
function main(n)
for i in 1:n
println(i)
end
end
main(10000)
And as @johnmyleswhite said, what you’re really benchmarking here is the printing performance and not the for loop, which is probably not what you actually want?
Sorry, I’m back again. There are still some questions that have been bothering me.
Must I run Julia code in that special manner to achieve a better performance? Can I say it “special”? I mean the workflow that wrapping the code in a function, pre-running it, and counting the time of calling it. What is the scientific basis for evaluating performance in this manner? Is this fair to Python? Should Python code also be evaluated using such a manner?
I did another comparison, running the following two files through commands python test2.py and julia test2.jl on the terminal respectively, and found that Python (10~11 sec) took longer than Julia (7~8 sec) this time. Can I conclude from this that Julia performs better than Python only when the code is very time consuming?
test2.py:
total = 0
for i in range(1, 100000001):
total += i
print(total)
test2.jl:
total = 0
for i in 1:100000000
global total += i
end
println(total)
May I ask what does the “setup” you’ve mentioned specifically mean?
I, as a PhD student, have reported in scientific papers the timings for Julia code using the method mentioned, this is: run the code with some dummy input that triggers the compilation first, and then run with the real data and time it. As a compiled language, Julia will always be not entirely comparable to Python. How do you compare C and Python timings? Do you include the C compilation time or not? The difference Julia have from C is that is hard to completely separate the compilation phase from the runtime phase, it needs a dummy run that is guaranteed to call every function the real experiment would. It is up to the scientist to decide what is a fair experiment/comparison method for their experiment. One way to make the comparison “fair” for example, is to pass --compile=no to the Julia compiler/interpreter, so it becomes basically an interpreted language, however, then there is little reason to use Julia (which the advantage is exactly to be able to compile and have great performance past compilation because of that).
Julia performs better than Python when the code in Julia does not load a lot of external libraries and instead calls multiple times the same functions with similar types in the arguments and these functions do a lot of computational effort. Why? Because this minimizes how much time is spent compiling code in relation to running the compiled/optimized code. The worst case for Julia is a code that loads a lot of complex libraries and just call many distinct functions that do little effort a single time each. Why? Because then Julia will have to compile loads of code from libraries just to run a lightweight computation a single time. For this kind of script it is better to either use a scripting/interpreted language, or to pass --compile=no.
The way to write performant code in Julia is writing functions and following some guidelines that ensure type-stability, being the most important one not using non-constant global variables.
But that is not the way to write performant python code either. Python, alone, cannot do better than that, but if you had to write an actual code in python aiming for performance you would use some library, like numpy, numba, etc, depending on the problem.
These toy examples hardly tell you anything. For instance:
julia> function f()
total = 0
for i in 1:100000000
total += i
end
return total
end
f (generic function with 1 method)
julia> using BenchmarkTools
julia> @btime f()
2.672 ns (0 allocations: 0 bytes)
5000000050000000
As you can see, your time estimates on the order of seconds don’t mean anything in terms of performant code. And, in this case, a good compiler just simply returns the result:
julia> @code_llvm f()
; @ REPL[1]:1 within `f`
define i64 @julia_f_111() #0 {
top:
; @ REPL[1]:6 within `f`
ret i64 5000000050000000
}
Some toy examples may give an idea for the performance difference. The standard Normal distribution is defined as
f(x) = \frac{1}{\sqrt{2 \pi}} \exp{(-0.5 x^2)}
for -\infty < x < \infty. Lets implement a naive numerical integration (approximation) in Python like
import math
def f(x):
return (1/math.sqrt(2 * math.pi)) * math.exp(-0.5 * math.pow(x, 2))
def integrate(func, f, t, e = 0.0000001):
sum = 0.0
x = f
while x <= t:
sum += func(x) * e
x += e
return sum
print(integrate(f, -1, 1))
and here is a nearly same Julia implementation
function f(x)
return (1/sqrt(2 * pi)) * exp(-0.5 * x^2)
end
function integrate(func, from, to; e = 0.0000001)
sum = 0.0
x = from
while x <= to
sum += func(x) * e
x += e
end
return sum
end
print(integrate(f, -1, 1))
Even if the compilation takes time in Julia implementation, the results in terminal are like:
$ time julia 1.jl
0.6826895163607789
________________________________________________________
Executed in 808.25 millis fish external
usr time 758.27 millis 283.00 micros 757.99 millis
sys time 217.83 millis 745.00 micros 217.08 millis
$ time python3 1.py
0.6826895163607789
________________________________________________________
Executed in 9.02 secs fish external
usr time 8.96 secs 280.00 micros 8.96 secs
sys time 0.03 secs 748.00 micros 0.03 secs
Julia is generally better when the implementation consists on comprehensive loops. Yes, this is a toy example but there is not any compiler tricks, returning constants, etc.
Note that @jbytecode code does follow all the “peformance tips” mentioned above. That doesn’t make the code complicated. But there are some rules there which guarantee that the code is performant from the point of view of a Julia implementation.
(and it can be prettier than Python’s if the first function is just written as
Yes, at least if you decide that compilation time is not relevant for your use case. I’ll try to give another reason why that is fair.
When we care about performance, it is usually for complex functions which run for longer than, say, a minute. In that case, the compilation time is negligible anyway (there is some Julia code that takes longer than that to compile, but since 1.9 not that much).
However, these complex functions are made of several small components, which they call many times (for instance inside a loop). Of course we want to benchmark the components, to make the best of each of them. But is it realistic to include compilation for each individual run of these small components? No, because they will be compiled once and run many times. In fact, they can even be compiled on a small output (component(n=2)) and run on a larger one (component(n=10000)).
And as a matter of fact, BenchmarkTools.jl does exactly what we want: it runs the function several times and returns the minimum duration.
TLDR:
For complex functions that are called 1 time, compilation time is negligible compared to runtime
For small components that are called N times, compilation time is negligible compared to N x runtime
For instance the time it takes to launch Julia itself, and the time you spend loading packages with using ... statements. That is why the typical Julia workflow is to open a REPL in the morning, load all the packages, and only kill it in the evening.
At least that used to be the case before version 1.9 of the language. Now startup is much much faster, so this is less of a problem.
Funnily enough this probably isn’t true. math.pow in python will sometimes be faster than x*x because multiplication has to go through a more complicated runtime dispatch because python has a special double dispatch for addition and multiplication (__mul__ and __rmul__).