I found a curious behavior in Julia. When I ran the codes below, big difference in speed was observed.
Code 1
function main()
a = 0
p = 2^5
N = 100000000
for i in 1:N
a %= p
end
println(a)
end
@time main()
Result 1
0
0.839353 seconds (15.62 k allocations: 751.056 KiB)
Code 2
function main()
a = 0
p = 2^4
N = 100000000
for i in 1:N
a %= p
end
println(a)
end
@time main()
Result 2
0
0.020099 seconds (15.62 k allocations: 751.056 KiB)
At first I supposed that it was because 2^5 is bigger than 2^4, but with several trials, I realized that the magnitude of numbers had nothing to do with it and the power index was important. I summarized the result of my trials in the figures below. Specifically, if the power index is five or six, the calculation takes much time whatever the base number is.
Then, why does this phenomenon occur? I suppose that it is important whether βaβ is interpreted as a constant number or not because remainder calculations are performed faster when the divisor is constant, but I have no idea except for it. My Julia version is 1.4.0.
Upon doing so, I found the difference vanish. Indeed, the loop is not being ran in either case when inspecting with @code_native. It is true that the first version generates more code as 2^5 doesnβt become a constant of 32 but 2^4 does get inferred to be 16, but since either way the loop isnβt being ran, they both take nanoseconds.
Thank you for your fast reply! I thought that your hypothesis that it was only the difference of the time of pre-compilation looked so likely, but when I measured the time again with @benchmark of BenchmarkTools.jl, the difference did not disappear. My usage of BenchmarkTools may be wrong?
2^5
BenchmarkTools.Trial:
memory estimate: 176 bytes
allocs estimate: 6
--------------
minimum time: 628.993 ms (0.00% GC)
median time: 657.863 ms (0.00% GC)
mean time: 654.278 ms (0.00% GC)
maximum time: 688.478 ms (0.00% GC)
--------------
samples: 8
evals/sample: 1
2^4
BenchmarkTools.Trial:
memory estimate: 176 bytes
allocs estimate: 6
--------------
minimum time: 83.401 ΞΌs (0.00% GC)
median time: 91.750 ΞΌs (0.00% GC)
mean time: 195.334 ΞΌs (0.00% GC)
maximum time: 21.021 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Also, when I changed N, there seemed to be a linear relationship between N and the time.
Oh I seem to have made a mistake. It seems that I was testing without the println so it led it to optimize away the loop. If keeping the println it seems like it is failing to realise that p is a constant and allows it to optimize away the loop. I believe changing p = 32 will show that it is back to a small amount of time.
Oh, without println(a), the loop seems not to be ran. I tried it and got the same result too. a is not recognized as a constant number when the power index is 5 or 6β¦ It is interesting.
Thank you! I think your code is better than mine to evaluate the time of remainder calculations genuinely.
But also Iβm interested in why Julia fails to interpret a as a constant number or does something special when the power index is 5 or 6 now.
Yes, Iβm not sure why this happens, but the compiler somehow is not able to run the exponentiation that far for constant propagation. Take a look at the llvm code:
julia> foo() = 2^4
foo (generic function with 3 methods)
julia> @code_llvm foo()
; @ REPL[169]:1 within `foo'
define i64 @julia_foo_18883() {
top:
ret i64 16
}
I inspected the compiled code with @code_llvm and @code_native, and found that the interpretation of a was more complicated if the power index was 5 or 6. It is still a riddle why Julia compiler does so, but this is instructive information.
And the fact that the code generated by Base.power_by_squaring is quite complicated for higher powers, as you can see in your last code segment, so presumably LLVM is not able to simplify it.
Thank you! In Julia, power calculations seem to be implemented in power_by_squaring funcion. Now I think this is the reason. With the power index such as 1 = 0b1, 2 = 0b10, 3 = 0b11, 4 = 0b100, 7 = 0b111 and 8 = 0b1000, this algorithm is enough simple for the precompiler to transform p into a constant number, but with 5 = 0b101 and 6 = 0b110 itβs not, because it requires a rather complicated process. Moreover, I tried with the power index 14 = 0b1110, 15 = 0b1111, 16 = 0b10000, 30 = 0b11110, 31 = 0b11111 and 32 = 0b100000, then p was interpretted as a constant number if the power index was 15, 16, 31 or 32. It reinforced the hypothesis.