Function not faster than global scope

I am following along in Adrian Salceanu’s book “Julia Programming Projects”. He is on Julia 1.0 and I am using 1.4. He says that running in the global scope is slower than wrapping in a function because the function is pre-compiled and cites the following toy example:

julia> @time [x for x in 1:1_000_000];
0.031727 seconds (55.85 k allocations: 10.387 MiB)

julia> function onetomil()
[x for x in 1:1_000_000]
end
onetomil (generic function with 1 method)
julia> @time onetomil();
0.027002 seconds (65.04 k allocations: 10.914 MiB)

When I run the above code in the REPL I get 0.0023 consistently for the first (global-scope) example and variable speeds of between 0.0023 and 0.005 for the function. Assuming that this must be because I have a better/more modern machine than the author’s I upped the run to 1:1_000_000_000. Now I get runs of 2.44 seconds for the global-scope test and 2.99 seconds for the function-wrapped example (I can’t go any higher than that as I get an out of memory error).

So I must conclude that either there’s been a massive performance boost in list comprehension between v 1.0 and 1.4 and/or 1.4 is generally so fast as there is no discernible difference. What I can’t conclude is that wrapping the code in a function is really faster than using the global scope. Can anybody help me understand these results?

EDIT:
I did some more tests. I ditched the list comprehension as being too efficient and opted (for the purposes of the test only) to use a for loop and push each value to an initially empty array. As expected the for loop was significantly slower. A loop of 10_000_000 took 1.07 seconds in global scope and only 0.47 seconds when wrapped in a function (code below).

x = [ ]
@time for i in 1: 10_000_000
push!(x, i)
end

function loopy()
x = [ ]
for for i in 1: 10_000_000
push!(x, i)
end
end
@time loopy()

So, I thought I’d proved not using global scope is indeed quicker. BUT, I then upped the loop to 1_000_000_000 as before expecting to see a bigger divergence. Global scope took 8.57 seconds but the function gave vary variable results again between 9.16 and 12.15 seconds - again slower than global scope.

I am back to “the jury is out” on whether functions are really faster than global scope and speeds get flaky when in a function. What is happening here? Why can’t I see functions being faster than using the global scope?

EDIT #2
Tried the same with @btime and conclusively proved that running in a function is slightly slower than running in global scope! Using the for-loop with a range of 10_000_000, @btime records the global scope loop at 878.8ms (152.58 MiB) and 2.334s (281.58 MiB) for the same loop wrapped in a function.

Welcome!

Variables in global scope cannot be type stable and are therefore slow.
Your 1st example does not use any variable in global scope (a list comprehension is a local scope), therefore it does not matter if it is wrapped in a function or not.
For your 2nd example, this is different because x is defined in global scope in the non-function version.

1 Like

Another thing, please wrap your code examples in triple backticks ``` (or single for inline code). It makes it more readable, but, more importantly, every time you write @time (without backticks), you are pinging the Discourse user with the handle “at time”. So take especially good care to use backticks when including code with macros.

1 Like

I am not familiar with this book so I am not sure if this is what it means to convey with this example, but the relevant advice is avoiding global variables.

The x in the comprehensions [x for x in 1:1_000_000] is local to that comprehension, so this is not an issue. The loopy example runs into another issue: abstract type parameters. Making the container Int[] will give you a large speedup.

I would recommend just reading the entire performance tips chapter.

3 Likes

When doing benchmarks like these, you should definitely use the BenchmarkTools package. It is much more appropriate for micro-benchmarks.

1 Like

Thanks for all the useful replies. I had read that declaring types was faster so was going to test that too but got side tracked. Declaring x as Int greatly reduces the time for the ‘’‘loopy’‘’ function - so nice to see it in action (99ms vs nearly 2,3s)…

It’s interesting that, if a list comprehension is a local scope, why did the author of the book got a speed improvement by wrapping the comprehension in a function. That’s a mystery, but it does explain why my function and the “naked” list comprehension showed similar times. It was my assumption that list comprehensions are optimised that made me try using a less optimal for loop.

Anyway - thanks for the input. This has been instructive!

1 Like
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.0 (2019-11-26)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> @time [x for x in 1:1_000_000];
  0.038972 seconds (47.98 k allocations: 10.562 MiB)

julia> @time [x for x in 1:1_000_000];
  0.031977 seconds (47.94 k allocations: 10.016 MiB)

julia> @time [x for x in 1:1_000_000];
  0.035335 seconds (47.94 k allocations: 10.041 MiB)

julia> function onetomil()
       [x for x in 1:1_000_000]
       end
onetomil (generic function with 1 method)

julia> @time onetomil();
  0.176014 seconds (116.47 k allocations: 13.900 MiB, 54.32% gc time)

julia> @time onetomil();
  0.001361 seconds (6 allocations: 7.630 MiB)

julia> @time onetomil();
  0.001296 seconds (6 allocations: 7.630 MiB)

julia> 

My guess is that for the case of running it directly in global on the REPL, your getting a case that the command parsing etc. happens every call that is why I am getting 31-35ms consitantly.
The first call to the function is slow (176ms) since it compiles, thereafter it is fast (1.3ms).

Thus I suspect what your seeing is not using global as much as REPL parsing overhead.

julia> using BenchmarkTools

julia> @btime onetomil();
  1.244 ms (2 allocations: 7.63 MiB)

julia> @btime [x for x in 1:1_000_000];
  1.181 ms (2 allocations: 7.63 MiB)

julia> @btime [x for x in 1:1_000_000];
  1.274 ms (2 allocations: 7.63 MiB)

If we use btime, the parsing is hidden and now they are equal.

Again, I don’t know this book, but generally when learning something complex I would try to pick a book that solves puzzles for me, instead of generating extra ones. So if you find the book puzzling, maybe just read the manual.

This is not true. Implicitly, with [] you are using Any[], so you are also declaring a type.

Concrete types are faster, and ideally you should follow a programming style that they rarely ever need to be explicitly declared.

4 Likes