rryi
November 11, 2019, 6:16pm
1
I found differences in code generation for simple constant expressions: 1<<24-1 generates a constant, 2^24-1 generates a quite expensive C function call. Code example below.
I did expect Julia to replace both expressions by the constant 0xFFFFFF during optimization/code generation. Do we have an old issue wrongly marked as closed?
The same question was put 2015:
Is there documentation for the rules for constant elimination? Or is there a well-documented/readable source file I could read?
I recently ran into a rather severe performance gotcha in julia 0.6.0 (which is almost worth a bug report, either to speed up such cases or to document that such code does not work properly in julia):
function ft1(n)
sum = 0
for i in 1:n
sum += (2^6)
end
return sum
end
function ft2(n)
sum = 0
for i in 1:n
sum += 64
end
…
Its answer links to constant-propagation for @pure functions · Issue #14324 · JuliaLang/julia · GitHub which was closed in 2017.
My example (Julia 1.0.5, Windows 10, 64bit):
julia> function uintmulshift()
v :: UInt64 = (1<<24)-1
end
uintmulshift (generic function with 1 method)
julia> @code_native uintmulshift()
.text
; Function uintmulshift {
; Location: none:3
pushq %rbp
movq %rsp, %rbp
movl $16777215, %eax # imm = 0xFFFFFF
popq %rbp
retq
nopl (%rax,%rax)
;}
julia> function uintmulexp()
v :: UInt64 = 2^24-1
end
uintmulexp (generic function with 1 method)
julia> @code_native uintmulexp()
.text
; Function uintmulexp {
; Location: none:3
pushq %rbp
movq %rsp, %rbp
; Function literal_pow; {
; Location: none
; Function macro expansion; {
; Location: none
; Function ^; {
; Location: intfuncs.jl:220
subq $32, %rsp
movabsq $power_by_squaring, %rax
movl $2, %ecx
movl $24, %edx
callq *%rax
;}}}
; Function convert; {
; Location: number.jl:7
; Function Type; {
; Location: boot.jl:722
; Function toUInt64; {
; Location: boot.jl:692
; Function check_top_bit; {
; Location: boot.jl:581
; Function is_top_bit_set; {
; Location: boot.jl:571
addq $-1, %rax
;}
js L42
;}}}}
addq $32, %rsp
popq %rbp
retq
; Function convert; {
; Location: number.jl:7
; Function Type; {
; Location: boot.jl:722
; Function toUInt64; {
; Location: boot.jl:692
; Function check_top_bit; {
; Location: boot.jl:581
L42:
movabsq $throw_inexacterror, %r9
movl $227233368, %ecx # imm = 0xD8B4E58
movl $234489840, %edx # imm = 0xDFA07F0
movq %rax, %r8
callq *%r9
ud2
ud2
nopl (%rax,%rax)
;}}}}}
julia>
klaff
November 11, 2019, 7:04pm
2
Interesting. I get same under 1.3.0-rc4.1 under Windows 10:
julia> @code_native (() -> 2^24-1)()
.text
; ┌ @ none:1 within `#19'
pushq %rbp
movq %rsp, %rbp
; │┌ @ none within `literal_pow'
; ││┌ @ none within `macro expansion'
; │││┌ @ intfuncs.jl:221 within `^'
subq $32, %rsp
movabsq $power_by_squaring, %rax
movl $2, %ecx
movl $24, %edx
callq *%rax
; │└└└
; │┌ @ int.jl:52 within `-'
addq $-1, %rax
; │└
addq $32, %rsp
popq %rbp
retq
nopl (%rax,%rax)
; └
julia> @code_native (() -> 1<<24-1)()
.text
; ┌ @ none:1 within `#21'
pushq %rbp
movq %rsp, %rbp
movl $16777215, %eax # imm = 0xFFFFFF
popq %rbp
retq
nopl (%rax,%rax)
; └
julia>
klaff
November 11, 2019, 7:10pm
3
Simpler example. 2^4
is okay. 2^5
is right out.:
julia> @code_native (() -> 2^4)()
.text
; ┌ @ none:1 within `#37'
pushq %rbp
movq %rsp, %rbp
movl $16, %eax
popq %rbp
retq
nopl (%rax,%rax)
; └
julia> @code_native (() -> 2^5)()
.text
; ┌ @ none:1 within `#39'
pushq %rbp
movq %rsp, %rbp
movl $2, %ecx
movl $2, %eax
nop
; │┌ @ none within `literal_pow'
; ││┌ @ none within `macro expansion'
; │││┌ @ intfuncs.jl:221 within `^'
; ││││┌ @ intfuncs.jl:209 within `power_by_squaring'
; │││││┌ @ int.jl:54 within `*'
L16:
imulq %rax, %rax
; ││││└└
; ││││┌ @ int.jl:424 within `power_by_squaring'
addq $-1, %rcx
; └└└└└
; ┌ @ intfuncs.jl:208 within `#39'
jg L16
addq %rax, %rax
; └
; ┌ @ none:1 within `#39'
popq %rbp
retq
nop
; └
julia>
klaff
November 11, 2019, 7:51pm
4
Maybe the complexity of power_by_squaring somehow makes it hard for LLVM to do the elimination?
Although it manages to for 2 raised to the powers 4, 7,8, 15,16, 31,32, 63,64.
Maybe add this to the Performance tips doc?
mbauman
November 11, 2019, 7:55pm
5
It would probably make sense for ^
to detect common bases (like 2
, 2.0
, ℯ
, and 10.0
) and jump out to specialized routines in those cases.
In general, integer exponentiation is just doing a “smart” number of squarings, so that’s why small/easy exponents are more likely to constant fold than bigger ones. As far as I’m aware there are no guarantees on how constant propagation works — it’s “just” an optimization.
3 Likes
tkf
November 11, 2019, 11:28pm
6
A simple-minded approach to compute 2^24-1
at compile time is to use Base.literal_pow
:
julia> @inline Base.literal_pow(::typeof(^), x::Base.HWNumber, ::Val{p}) where p =
if p < 0
Base.literal_pow(^, inv(x), Val(p))
else
Base.literal_pow(^, x, Val(p-1)) * x
end;
julia> f() = 2^24 - 1;
julia> @code_llvm f()
; @ REPL[2]:1 within `f'
define i64 @julia_f_17031() {
top:
ret i64 16777215
}
ATM, literal_pow
is only defined up to 3:
https://github.com/JuliaLang/julia/blob/6eebbbe2d205f8116330a77ca5e15f4a356232db/base/intfuncs.jl#L242-L246
I wonder if it makes sense to increase the limit.
Keno
November 12, 2019, 12:09am
7
The compiler should also definitely be able to constant fold 2^24 no matter how it is defined. The fact that it doesn’t is a bit of a compiler bug, though of course it may not be entirely trivial to fix.
7 Likes
Sorry, I did the bad thing and knew about a bug,
but never openned an issue for it
https://github.com/JuliaLang/julia/issues/33822
2 Likes