Possible Performance Regression for Loops on 1.9?

There is a function in Bogumił Kamiński’s book Julia for Data Analysis(page 7) with the purpose to show off the optimizations done by Julia compiler.

function sum_n(n) 
    s = 0
    for i in 1:n
        s += i
    end
    return s 
end

In the book, the code shown superb results in runtime, like 0.000001 seconds, where running the Julia v1.7. However, I tested the same function in both v1.8.3, v1.8.5 and v1.9.0, the results were not the case, which yielded about one second. Thus, I am wondering that what happened from v1.7 to v1.8 and above on the loops?

2 Likes

Could you please re-run your function after using the package BenchmarkTools.jl? Like this:

using BenchmarkTools

@btime sum_n(1_000_000_000)

And post back the results?

P.S. Also, welcome to the Julia community! Thanks for bringing this up! :wave:

1 Like

This is very odd. The @code_native is showing that the loop can be folded, but when run in the global scope, it’s running the loop.

A small remark. First of all and as a rule of thumb, you should benchmark with @time on the second call of the function. That is because the first time you call the function it will also compile it and essentially @time will also include compilation time, which is something you typically are not interested in.
Also using BenchmarkTools.jl as mentioned above will help a deal.

This isn’t compilation time. Running @time repeatedly shows it taking 1.7 seconds, but running @btime on it takes nanoseconds (and this is easy to see since @btime runs faster than @time which shouldn’t be possible).

1 Like

Ok, this appears to be something weird @time is doing. just running sum_n is fast.

3 Likes

Thanks for the info. Here is the results.

While there were almost equivalent performance in terms of running @btime, but still the difference about a nanosecond. In the book, Bogumił explained that the consecutive sum would be optimized as to n(n+1)/2. I am not sure it is still the case in v1.8.0 or above. How can I check it, given that I don’t know assembly language.

I checked with a much larger number, and the result implies that the n(n+1)/2 still the case?

Yes, if I conduct @time sum_n(1_000_000_000_000), there is a demanding CPU loading and no expected results shown after a long time.

Hm. Thanks for posting this back! This is helpful.

I cannot explain why it is apparently slower by a single nanosecond – perhaps a regression somewhere on 1.9 that hasn’t been caught yet? – but I’d say the speeds are still comparable as you mentioned. I am sure a speed hacker like @Oscar_Smith or @Elrod could probably comment more. Just looking at this though from my perspective, this does seem to be a minor regression on 1.9.

P.S. I updated the title of your post to make it a bit more precise for other Julians to see/understand.

Thanks and please go ahead.

1 Like

There was some change to time to avoid constant folding and similar instances of the compiler defeating the benchmark iirc recently

The issue is Spurious performance regression in Julia 1.8 vs 1.7 for `@time` in top-level-scope · Issue #47561 · JuliaLang/julia · GitHub

There is a proposed fix, but it is not yet ready for prime time.

6 Likes