I was optimizing some code in the hot path and was surprised to see that constants written as 10^6 were being evaluated at run-time. Isn’t evaluating constants like this one of the simplest optimizations? Why wouldn’t Julia simplify it?
julia> function f(x)
x*10^6
end
f (generic function with 1 method)
julia> @code_native f(1)
.text
; ┌ @ REPL[27]:1 within `f'
pushq %rbx
movq %rdi, %rbx
; │ @ REPL[27]:2 within `f'
; │┌ @ none within `literal_pow'
; ││┌ @ none within `macro expansion'
; │││┌ @ intfuncs.jl:273 within `^'
movabsq $power_by_squaring, %rax
movl $10, %edi
movl $6, %esi
callq *%rax
; │└└└
; │┌ @ int.jl:87 within `*'
imulq %rbx, %rax
; │└
popq %rbx
retq
; └
Sorry, I’m not familiar with this. If one uses 1e6 instead of 10^6 is that number then pre-compiled and if so does that speed up the running of the code?
I think the assembler code is evidence enough, but here you go:
julia> using BenchmarkTools
julia> function f1(x)
x*10^6
end
f1 (generic function with 1 method)
julia> function f2(x)
x*1e6
end
f2 (generic function with 1 method)
julia> @btime f1(x) setup=(x=rand()) evals=1;
32.000 ns (0 allocations: 0 bytes)
julia> @btime f2(x) setup=(x=rand()) evals=1;
29.000 ns (0 allocations: 0 bytes)
hmmm, you’re calling it with the output of rand() which is a float, so it’s all going to be converted to float anyway. So if you’re using floats anyway 1e6 is your best bet.
So, it basically saves you ~ 10% of the function eval to use a fixed constant. when the function is a trivial multiply function. But of course most people write functions where much more occurs than multiplying by a constant. When the function has say 11 steps, the calculation of a constant at the beginning will involve maybe 1% instead of 10%.
Okay so what is the upside for loosing that 1%? I mean what does the compiler gain in not converting it to a constant? Granted that assumes this was done for a reason. Maybe the compiler developers feel that we should multiple our own *** ***** constants?
I don’t have an explanation. I just wanted to point out that reading the assembly can be a bit confusing sometimes, so benchmarking is a good idea, no matter how it looks.
It’s a good question about why it’s not statically precalculated. It seems likely this is just a kind of optimization that doesn’t yet exist or the compiler didn’t catch it in this case.
Also, I’m not sure about the benchmarking here. I tried the Ref trick instead of evals=1, since that will probably be limited by some minimum timing quantization.
This seems more reasonable for a simple multiplication.
So, the answer, I think, is that, yes, there is a difference in performance, and it’s due to a limitation in how far constant propagation goes for certain operations like power_by_squaring.
And you can solve it by using a different literal:
Yeah, this is sometimes a little annoying, but one option you always have is to write a macro that just evals an expression inside the macro body and then never use it on expressions involving local variables and/or impure operations:
macro eval_at_parse_time(ex)
eval(ex)
end
let x = Ref(4)
f1(x) = x * 10^6
f2(x) = x * 1_000_000
f3(x) = x * @eval_at_parse_time 10^6
@btime f1($x[])
@btime f2($x[])
@btime f3($x[])
end