Inconsistent performance using for loops in REPL

I was using the julia REPL to solve some project Euler problems and came across this weird behavior. When I ran the simplest for loop I could think of, I experienced the worst performance I recall having using julia. However when I put something meaningful, albeit rather trivial, into the loop, the performance was like what I have come to expect from this favorite language of mine.

I tried wrapping both statements into functions with significant speedup in both cases but the numbers speak for themselves

image

I’m unsure as to if I should open an issue on github about this but 42 seconds can’t be acceptable. I did repeat all experiments with similar results, on a fresh run of the REPL, it went down to 26 seconds, still with the 16 GiBs of allocated memory.

I’m using Julia 0.6.0 on Windos.

1 Like

Yes, this is just how Julia works. Julia’s “unit of compilation” is the function. Every function it hits in the global dynamic (REPL) scope will be compiled once and then ran. But this means that you want meaningful/long calculations to be in functions, otherwise you won’t get the optimizations. This is why the first performance tip is to put things in a function.

https://docs.julialang.org/en/stable/manual/performance-tips.html#Avoid-global-variables-1

Here’s more of an explanation:

2 Likes

I have the feeling it might be a good idea to extend those @time, @btime, @benchmark, etc. macros to print some warnings if the code is not wrapped into a function. :wink:

7 Likes

I didn’t experiment but I guess that the compiler is smart enough to compute the summation in T function and just return the result whereas it cannot prove some formula for xor and do the actual computation in runtime for both repl and W.

But we often want to measure elapsed time on top level (e.g. measureing package load time).

No. Having to constantly wrap code snippets in functions would be a gigantic pain for no gain.

You don’t need to put code in a function to benchmark it, and I don’t understand why everyone keeps saying so.

Use BenchmarkTools!

using BenchmarkTools

function foo(t)
    for i in 1:2^30
        t += 1
    end
    return t
end

julia> @btime begin
    t = 1
    for i in 1:2^30
        t += 1
    end
end
  8.064 ns (0 allocations: 0 bytes)

julia> @btime foo(1)
  9.774 ns (0 allocations: 0 bytes)

Normally, you should interpolate the variables into the expression, like so: $t += 1, but in this particular case it’s not necessary.

2 Likes

I agree, but I still think that some kind of warning could be useful for the obvious misuse patterns.

Julia is very powerful, but at the same time has a specific performance model that requires some adjustments from the user. Many of the performance gotchas I see on forums come from not making these, and getting absolutely no warning about it. Too much verbosity is not good, yet perhaps warning the first time per session could be useful.

If we allow for the possibility that early adopters to Julia are more experienced and adaptable than the average programmer, it is inevitable that these questions will come up more and more frequently as the user base widens. One of the big challenges of post-1.0 will be making the inspection and solution of performance problems more approachable to the general crowd.

1 Like

Yes, if it is possible for @time to detect that the code is not wrapped in a function, that might be useful. But this is not useful for @btime and @benchmark.

I am just a bit confused as to why the standard (and mostly only) advice given in cases like this is to function-wrap the code. I always use BenchmarkTools for everything, and think this should be the default suggestion.

2 Likes

Yep I agree that for @btime and @benchmark is a different story, but for @time I still think a warning is better than constantly telling people that “they’re doing it wrong”.

I’d also recommend BenchmarkTools but most of the newcomers to Julia will probably first play around in the REPL with the standard library (including @time) and sooner or later face such problems. Again, we cannot teach anyone in advance. If intuition leads to such usage patterns, it’s usually better to draw on that (by adding some warnings).

I don’t have a problem with those points. But I think that when people get to the point where they start posting questions to Discourse, they should be pointed towards BenchmarkTools, instead of only being told about the function “hack”, which is what happens 4 times out of 5 (WAG).

It is not a function “hack” though. Many optimizations are only active when expressions are put inside functions.

E.g. (on 0.7):

julia> struct Foo end

julia> Base.getproperty(::Foo, s::Symbol) = s == :v ? 1 : "2"

julia> f = Foo()
Foo()

julia> @code_warntype f.v
Variables:
  s::Symbol

Body:
  begin
      # meta: location operators.jl == 68
      SSAValue(1) = (s::Symbol === :v)::Bool
      # meta: pop location
      unless SSAValue(1) goto 6
      return 1
      6:
      return "2"
  end::Union{Int64, String}

julia> g(f::Foo) = f.v
g (generic function with 1 method)

julia> @code_warntype g(f)
Variables:
  f<optimized out>

Body:
  begin
      return 1
  end::Int64

Sure, BenchmarkTools might get around this by splicing the expression into a function, but learning about global variables and performance of functions is quite important for people new to Julia so I don’t think that emphasizing this is wrong.

1 Like

Conceded, but that is also why I put it in quotes, instead of straight up calling it a hack. It feels like a hack, and it is a hassle, IMHO. After working for a while I start to run out of temporary names for the codes I want to compare (foo, bar, etc.) and forget which is which…

From what I can see, the main tripping point for newcomers is not so much benchmarking but rather getting good performance for matlab/python-style “scripts”, for which (at least currently) the only answer is “put that in a function”. The advice to use functions as much as possible is a sensible one for newcomers, even if it’s slightly less natural for the task of benchmarking small pieces of code. It also feels less “magical” than BenchmarkTools’ macros when you’re not used to metaprogramming.

I see, after a bit of experimenting it seems that the bottleneck is the += function.


Here the function is called very close to the same amount of times on t as it is on w in the example in the OP so it seems that the code generation on the for loop itself is relatively fast but I assume that since the type of t can’t be inferred ideally, it causes the severe slowdown.

I don’t know how most people use julia but I really like the interactive experience of the REPL and do rather commonly write short logic like this in global scope. I do also avoid writing functions a bit since writing them in the REPL is a bit cumbersome although it is often necessary. Some time later found that the real culprits are the very type-unsafe global variables I’m using and they are being discussed by some core contributors, although there doesn’t seem to be much consensus on which (if any) optimization friendly limitations should be put on them.

Well there’s a usage thing here. If most of the time of your code is in a function call, like calling functions from packages, then sure do that in the REPL. But don’t write your core algorithms / loops in the REPL and expect that to be fast. Of course, prototype in the REPL, then throw a function around it and that’s your “package code” to then call with different arguments. Basically, the switch to a “compiled language” happens when you put code in a function, so use it as a tool.

If I don’t have a good description (finbounds, fsimd, finboundsimd) I just start appending numbers at the end. f1, f2, f3. Then at least I know f1 was an early version, while f8 probably came later.