Execution speed - can someone explain the difference?

Hello everyone,

first of all, I’m sorry if this is a stupid question, but I’m fairly new to Julia and also not a seasoned programmer or computer scientist at all. But I’m scratching my head a little here about something.

I’m not obsessed with the speed of programming languages at all, but I’d really like to understand this one.

So, after having watched some random Youtube video that compared the execution speed of a simple program counting from 0 to 1 billion in Python and C++, I got curious and just wanted to test how well Julia would perform in this regard, since it is somewhat hailed for its speed (compared to Python at least)

I also found this article where Logan Kilpatrick attempted the same thing:
https://juliazoid.com/no-julia-is-not-34-000-000-000-times-faster-than-python-f63e956313d7

I did not use any of Julia’s internal benchmarking tools/libraries but merely “time julia 1billion.jl” in the command line. Let’s disregard the fact that this is probably not a good way for comparing execution speed to other languages because of Julia’s rather long startup time.

However, the thing that stuck me and that I don’t quite understand is the following:

function count()
           n = 0
           while n < 1_000_000_000
               n +=1
           end
       end

count()

On my machine this program has a reasonable and even quite impressive execution time of about 150ms.

However, if I don’t wrap the counting loop in a separate function but run the program like this

n = 0
while n < 1_000_000_000
    global n +=1
end

It should basically be the same thing, but this program by contrast takes about 75 seconds to execute. Which is even slower than a Python script with the same code.

Is this really to be expected that this simple counter runs about 500 times slower if not wrapped in a function definition? What is the important thing to take away here when writing real world applications at some point?

Yes, it is expected. If you want performance, put your code into a function.

OK, to some degree it also helps to give the global variable a type, e.g.

n::Int64 = 0
while n < 1_000_000_000
    global n +=1
end

But better use functions…

I get:

julia> function count()
                  n = 0
                  while n < 1_000_000_000
                      n +=1
                  end
                  n
        end
count (generic function with 1 method)

julia> @time count()
  0.000002 seconds
1000000000
1 Like

It should be effectively instant, because the compiler deletes the loop.
Maybe you’re including compile (and Julia startup?) time in this?

Note, if you actually returned n afterwards, the compiler would still delete the loop:

julia> function count()
                  n = 0
                  while n < 1_000_000_000
                      n +=1
                  end; return n
              end
count (generic function with 1 method)

julia> @code_llvm count()
;  @ REPL[1]:1 within `count`
define i64 @julia_count_139() #0 {
top:
;  @ REPL[1]:5 within `count`
  ret i64 1000000000
}

It turns into just ret i64 1000000000 at the end.

7 Likes

So, this is all some compiler magic and optimization going on? Hypothetically, if I printed n on every iteration, I should probably expect both versions of the program to run equally slow?

1 Like

The function version would run in O(N) time instead of O(1), where N is the number of loop iterations.
It should still be faster, because the compiler will be able to (for example) get rid of dynamic dispatches.

Printing is very slow though, so you won’t see much difference.

That is, inside a function, the work done within the loop (e.g., calling println) will be faster, but the functions being called won’t be.
For functions that are cheap enough (like doing arithmetic operations), they can be inlined and further optimizations applied, but printing is expensive.

1 Like

Just in case the question was about a more basic thing than what @Elrod is explaining:

Is this really to be expected that this simple counter runs about 500 times slower if not wrapped in a function definition? What is the important thing to take away here when writing real world applications at some point?

Read this.

The idea is that if you work on the global scope, then Julia treats the global variables as having any type and hence it can’t specialize when it computes the operations.

Instead, if you wrap code into a function, Julia will first attempt to identify the type of any local variable and function argument. Then, it can specialize the way to compute each operation for the types identified (e.g., Float64, Int64, Bool).

This is why as @ufechner7 was suggesting an alternative to work with functions, which is to indicate the type of a global variable used in the operation. This means writing n::Int64 = 0.

3 Likes

That’s not really an alternative, in that functions are optimized.
Global scope is not.

So in that sense, it is essential for performance to do work within functions.
Calling functions that do work from global scope is fine.

Annotating globals does allow code using them within functions to be optimized.

Non-const, non-annotated globals cannot really be optimized within a function, because the compiler doesn’t know the type.
Type groundedness (internal operations being inferred) and type stability (return type being inferred, which prevents viral spread of non-inferred code) are both important for optimization.

1 Like

Thank you all for your explanations so far. I just tested the same examples but with a type specification for the global variable n. It does help a little bit and brings execution speed down from 75 seconds to about 47 seconds. But it’s still much slower than the version with the function.

Also thank you for the hint to the Performance Tips from the documentation. I’ll make sure to read that thoroughly.

1 Like

Yes, I was trying to keep it simple, in case OP is just starting with Julia and feels overwhelmed with so many concepts stated.

3 Likes

Another way to put it: First, lets remove the compiler magic, that makes your example too simple, such that the function does nothing, and you measured just the Julia startup time. The function below actually does something:

julia> function sum_stuff(n)
           for i in 1:1000
               n += mod(i, n)
           end
           return n
       end
sum_stuff (generic function with 1 method)

julia> using Chairmarks

julia> @b sum_stuff(15)
11.072 μs

The using Chairmarks and @b estimate the time of the execution (you can also use using BenchmarkTools; @btime sum_stuff(15)).

Now, with a global n (not that I still run the code inside a function):

julia> function sum_stuff()
           for i in 1:1000
               global n += mod(i, n)
           end
           return n
       end
sum_stuff (generic function with 2 methods)

julia> n = 15
15

julia> @b sum_stuff()
43.242 μs (1986 allocs: 31.094 KiB)

Slower, and allocates intermediates.

What happened: since n is a global variable, the compiler cannot know that it is an integer during the execution of the function. Thus, every time it does something with n it has to check if the operation is valid, and if the return type is an integer, or something else. Then it has to save space (allocate) space for possible variable types of output values of the operations with n.

Nothing of that is necessary if all happens inside the function, and if n is a input variable of that function. Then, the compiler knows that, during the function execution, n is always an integer (Int64 in this case) and can just perform the operations without further complications, and it can do optimizations in the sequence of operations (like the ones mentioned by Chris Elrod above.

Interpreted languages work more like the first case. Compiled language more like the second case. Julia can run as an interpreted language, if the code has variables for which the type cannot be known (and a particular case of this is when all the code is just in global scope). And it will run as a compiled language if all the variables have known types (specifically, if they can be inferred from the input variables of the function and from the operations the function performs with those variables).

5 Likes