The big difference in speed of remainder calculations

likein12 · April 11, 2020, 4:23pm

I found a curious behavior in Julia. When I ran the codes below, big difference in speed was observed.

Code 1

function main()
    a = 0
    p = 2^5
    N = 100000000
    for i in 1:N
        a %= p
    end
    println(a)
end

@time main()

Result 1

0
  0.839353 seconds (15.62 k allocations: 751.056 KiB)

Code 2

function main()
    a = 0
    p = 2^4
    N = 100000000
    for i in 1:N
        a %= p
    end
    println(a)
end

@time main()

Result 2

0
  0.020099 seconds (15.62 k allocations: 751.056 KiB)

At first I supposed that it was because 2^5 is bigger than 2^4, but with several trials, I realized that the magnitude of numbers had nothing to do with it and the power index was important. I summarized the result of my trials in the figures below. Specifically, if the power index is five or six, the calculation takes much time whatever the base number is.

Then, why does this phenomenon occur? I suppose that it is important whether ‘a’ is interpreted as a constant number or not because remainder calculations are performed faster when the divisor is constant, but I have no idea except for it. My Julia version is 1.4.0.

Syx_Pek · April 11, 2020, 4:38pm

I suspect this is just measuring the time of pre-compilation. One should use GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language to benchmark their programs.

Upon doing so, I found the difference vanish. Indeed, the loop is not being ran in either case when inspecting with @code_native. It is true that the first version generates more code as 2^5 doesn’t become a constant of 32 but 2^4 does get inferred to be 16, but since either way the loop isn’t being ran, they both take nanoseconds.

likein12 · April 11, 2020, 5:18pm

Thank you for your fast reply! I thought that your hypothesis that it was only the difference of the time of pre-compilation looked so likely, but when I measured the time again with @benchmark of BenchmarkTools.jl, the difference did not disappear. My usage of BenchmarkTools may be wrong?

2^5

BenchmarkTools.Trial:
  memory estimate:  176 bytes
  allocs estimate:  6
  --------------
  minimum time:     628.993 ms (0.00% GC)
  median time:      657.863 ms (0.00% GC)
  mean time:        654.278 ms (0.00% GC)
  maximum time:     688.478 ms (0.00% GC)
  --------------
  samples:          8
  evals/sample:     1

2^4

BenchmarkTools.Trial:
  memory estimate:  176 bytes
  allocs estimate:  6
  --------------
  minimum time:     83.401 μs (0.00% GC)
  median time:      91.750 μs (0.00% GC)
  mean time:        195.334 μs (0.00% GC)
  maximum time:     21.021 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Also, when I changed N, there seemed to be a linear relationship between N and the time.

In any case, I will check it with @code_native next.

Syx_Pek · April 11, 2020, 5:21pm

Oh I seem to have made a mistake. It seems that I was testing without the println so it led it to optimize away the loop. If keeping the println it seems like it is failing to realise that p is a constant and allows it to optimize away the loop. I believe changing p = 32 will show that it is back to a small amount of time.

tbeason · April 11, 2020, 5:45pm

What if you return a rather than print it?

DNF · April 11, 2020, 5:49pm

If you want to benchmark remainder calculations you should do just that, and not mix in printing, which will tend to dominate the runtime.

Here’s an example comparing rem with 2^5 and 2^4:

julia> @benchmark a % p setup=(a=rand(1:100_000_000); p=2^5)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.695 ns (0.00% GC)
  median time:      4.761 ns (0.00% GC)
  mean time:        4.959 ns (0.00% GC)
  maximum time:     24.656 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> @benchmark a % p setup=(a=rand(1:100_000_000); p=2^4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.612 ns (0.00% GC)
  median time:      4.758 ns (0.00% GC)
  mean time:        5.009 ns (0.00% GC)
  maximum time:     46.481 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

likein12 · April 11, 2020, 6:04pm

Oh, without println(a), the loop seems not to be ran. I tried it and got the same result too. a is not recognized as a constant number when the power index is 5 or 6… It is interesting.

likein12 · April 11, 2020, 6:05pm

I tried it. Almost the same result as that of println(a) was observed. With the return code, the loop seems to be ran.

likein12 · April 11, 2020, 6:10pm

Thank you! I think your code is better than mine to evaluate the time of remainder calculations genuinely.
But also I’m interested in why Julia fails to interpret a as a constant number or does something special when the power index is 5 or 6 now.

DNF · April 11, 2020, 6:21pm

Yes, I’m not sure why this happens, but the compiler somehow is not able to run the exponentiation that far for constant propagation. Take a look at the llvm code:

julia> foo() = 2^4
foo (generic function with 3 methods)

julia> @code_llvm foo()

;  @ REPL[169]:1 within `foo'
define i64 @julia_foo_18883() {
top:
  ret i64 16
}

julia> foo() = 2^5
foo (generic function with 3 methods)

julia> @code_llvm foo()

;  @ REPL[171]:1 within `foo'
define i64 @julia_foo_18884() {
L21.lr.ph:
; ┌ @ none within `literal_pow'
; │┌ @ none within `macro expansion'
; ││┌ @ intfuncs.jl:238 within `^'
; │││┌ @ intfuncs.jl:225 within `power_by_squaring'
      br label %L21

L21:                                              ; preds = %L21.lr.ph, %L21
      %0 = phi i64 [ 1, %L21.lr.ph ], [ %2, %L21 ]
      %value_phi35 = phi i64 [ 2, %L21.lr.ph ], [ %1, %L21 ]
; ││││ @ intfuncs.jl:226 within `power_by_squaring'
; ││││┌ @ int.jl:54 within `*'
       %1 = mul i64 %value_phi35, %value_phi35
; ││││└
; ││││ @ intfuncs.jl:225 within `power_by_squaring'
; ││││┌ @ int.jl:52 within `-'
       %2 = add nsw i64 %0, -1
; ││││└
; ││││┌ @ operators.jl:341 within `>='
; │││││┌ @ int.jl:410 within `<='
        %3 = icmp slt i64 %0, 1
; ││││└└
      br i1 %3, label %L23, label %L21

L23:                                              ; preds = %L21
      %phitmp = shl i64 %1, 1
; └└└└
  ret i64 %phitmp
}

likein12 · April 12, 2020, 2:32pm

I inspected the compiled code with @code_llvm and @code_native, and found that the interpretation of a was more complicated if the power index was 5 or 6. It is still a riddle why Julia compiler does so, but this is instructive information.

DNF · April 12, 2020, 2:50pm

Without knowing, I would guess that calculating higher powers is more work, and the compiler will only do so much before giving up.

dpsanders · April 12, 2020, 2:55pm

I think that this may have to do with literal powers being specialised on using the literal_pow function?

https://github.com/JuliaLang/julia/blob/b773bebcdb1eccaf3efee0bfe564ad552c0bcea7/base/intfuncs.jl#L241

And the fact that the code generated by Base.power_by_squaring is quite complicated for higher powers, as you can see in your last code segment, so presumably LLVM is not able to simplify it.

likein12 · April 13, 2020, 8:00am

Thank you! In Julia, power calculations seem to be implemented in power_by_squaring funcion. Now I think this is the reason. With the power index such as 1 = 0b1, 2 = 0b10, 3 = 0b11, 4 = 0b100, 7 = 0b111 and 8 = 0b1000, this algorithm is enough simple for the precompiler to transform p into a constant number, but with 5 = 0b101 and 6 = 0b110 it’s not, because it requires a rather complicated process. Moreover, I tried with the power index 14 = 0b1110, 15 = 0b1111, 16 = 0b10000, 30 = 0b11110, 31 = 0b11111 and 32 = 0b100000, then p was interpretted as a constant number if the power index was 15, 16, 31 or 32. It reinforced the hypothesis.

Topic		Replies	Views
Strange performance of a loop Performance	37	3177	July 21, 2018
What makes Julia loops so fast? General Usage question	8	1029	December 20, 2021
Seemingly large permformance regression on "^" power operator in >=1.6 General Usage bug , performance	2	405	April 28, 2021
Speed of power "^" Performance	16	921	May 6, 2022
Execution speed - can someone explain the difference? New to Julia	9	417	April 8, 2024

The big difference in speed of remainder calculations

Code 1

Result 1

Code 2

Result 2

Related topics