Why isn't 10^6 evaluated at compile time?

robsmith11 · May 25, 2020, 1:17pm

I was optimizing some code in the hot path and was surprised to see that constants written as 10^6 were being evaluated at run-time. Isn’t evaluating constants like this one of the simplest optimizations? Why wouldn’t Julia simplify it?

julia> function f(x)
        x*10^6
       end
f (generic function with 1 method)

julia> @code_native f(1)
        .text
; ┌ @ REPL[27]:1 within `f'
        pushq   %rbx
        movq    %rdi, %rbx
; │ @ REPL[27]:2 within `f'
; │┌ @ none within `literal_pow'
; ││┌ @ none within `macro expansion'
; │││┌ @ intfuncs.jl:273 within `^'
        movabsq $power_by_squaring, %rax
        movl    $10, %edi
        movl    $6, %esi
        callq   *%rax
; │└└└
; │┌ @ int.jl:87 within `*'
        imulq   %rbx, %rax
; │└
        popq    %rbx
        retq
; └

lobingera · May 25, 2020, 4:15pm

Because 1e6 exists.

Ian_Slagle · May 25, 2020, 4:18pm

Sorry, I’m not familiar with this. If one uses 1e6 instead of 10^6 is that number then pre-compiled and if so does that speed up the running of the code?

DNF · May 25, 2020, 4:24pm

This isn’t the answer, and it would lead to a function with different behavior, it would always produce floats with at least 64 bit precision.

I’m afraid I don’t know the answer, @Ian_Slagle, but you could use 1_000_000 instead (the underscores are optional).

BTW I may not actually make a difference at runtime, appearances notwithstanding. You should benchmark with BenchmarkTools.jl to tell for sure.

tamasgal · May 25, 2020, 4:36pm

I think the assembler code is evidence enough, but here you go:

julia> using BenchmarkTools

julia> function f1(x)
           x*10^6
       end
f1 (generic function with 1 method)

julia> function f2(x)
           x*1e6
       end
f2 (generic function with 1 method)

julia> @btime f1(x)  setup=(x=rand()) evals=1;
  32.000 ns (0 allocations: 0 bytes)

julia> @btime f2(x)  setup=(x=rand()) evals=1;
  29.000 ns (0 allocations: 0 bytes)

dlakelan · May 25, 2020, 4:37pm

1e6 is a float, you want to test x*10^6 against x*1000000;

tamasgal · May 25, 2020, 4:37pm

You’re right…

No difference though:

julia> function f2(x)
           x*1_000_000
       end
f2 (generic function with 1 method)

julia> @btime f2(x)  setup=(x=rand()) evals=1;
  29.000 ns (0 allocations: 0 bytes)

It seems like there is no purity guarantee for that method.

dlakelan · May 25, 2020, 4:40pm

hmmm, you’re calling it with the output of rand() which is a float, so it’s all going to be converted to float anyway. So if you’re using floats anyway 1e6 is your best bet.

tamasgal · May 25, 2020, 4:41pm

I didn’t tell the full story, it’s (almost) the same when calling with Int64 I just assumed the usual call of the original function is a float…

julia> @btime f1(x)  setup=(x=rand(Int64)) evals=1;
  32.000 ns (0 allocations: 0 bytes)

julia> @btime f2(x)  setup=(x=rand(Int64)) evals=1;
  28.000 ns (0 allocations: 0 bytes)

dlakelan · May 25, 2020, 4:44pm

So, it basically saves you ~ 10% of the function eval to use a fixed constant. when the function is a trivial multiply function. But of course most people write functions where much more occurs than multiplying by a constant. When the function has say 11 steps, the calculation of a constant at the beginning will involve maybe 1% instead of 10%.

DNF · May 25, 2020, 4:49pm

This doesn’t make a lot of sense to me. Raising an integer to the 6th power takes ~10% as long as a single multiplication? Really?

tamasgal · May 25, 2020, 4:50pm

I am also wondering but I think the way I benchmarked it is ok. Any ideas?

There is definitely something “more” happening in the first function, not sure what

pixel27 · May 25, 2020, 4:52pm

Okay so what is the upside for loosing that 1%? I mean what does the compiler gain in not converting it to a constant? Granted that assumes this was done for a reason. Maybe the compiler developers feel that we should multiple our own *** ***** constants?

DNF · May 25, 2020, 4:52pm

I don’t have an explanation. I just wanted to point out that reading the assembly can be a bit confusing sometimes, so benchmarking is a good idea, no matter how it looks.

dlakelan · May 25, 2020, 4:54pm

It’s a good question about why it’s not statically precalculated. It seems likely this is just a kind of optimization that doesn’t yet exist or the compiler didn’t catch it in this case.

DNF · May 25, 2020, 4:58pm

I’ve seen this before. There’s simply a limit to how far this literal is calculated:

julia> foo(x) = x * 10^5;

julia> bar(x) = x * 10^4;

julia> @code_native foo(1)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[54]:1 within `foo'
	movl	$10, %eax
	movl	$2, %ecx
	nopw	(%rax,%rax)
; │┌ @ none within `literal_pow'
; ││┌ @ none within `macro expansion'
; │││┌ @ intfuncs.jl:238 within `^'
; ││││┌ @ intfuncs.jl:226 within `power_by_squaring'
; │││││┌ @ int.jl:54 within `*'
L16:
	imulq	%rax, %rax
; │││││└
; │││││ @ intfuncs.jl:225 within `power_by_squaring'
; │││││┌ @ operators.jl:341 within `>='
; ││││││┌ @ int.jl:410 within `<='
	addq	$-1, %rcx
; │││││└└
	jg	L16
	imulq	%rdi, %rax
; │└└└└
; │┌ @ int.jl:54 within `*'
	addq	%rax, %rax
	leaq	(%rax,%rax,4), %rax
; │└
	retq
	nopw	%cs:(%rax,%rax)
; └

julia> @code_native bar(1)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[55]:1 within `bar'
; │┌ @ REPL[55]:1 within `*'
	imulq	$10000, %rdi, %rax      ## imm = 0x2710
; │└
	retq
	nopl	(%rax,%rax)
; └

So it works for 10^4 but not 10^5 and up.

DNF · May 25, 2020, 5:06pm

Also, I’m not sure about the benchmarking here. I tried the Ref trick instead of evals=1, since that will probably be limited by some minimum timing quantization.

julia> x = 3
3

julia> @btime foo($(Ref(x))[]);
  3.180 ns (0 allocations: 0 bytes)

julia> @btime bar($(Ref(x))[]);
  1.456 ns (0 allocations: 0 bytes)

This seems more reasonable for a simple multiplication.

So, the answer, I think, is that, yes, there is a difference in performance, and it’s due to a limitation in how far constant propagation goes for certain operations like power_by_squaring.

And you can solve it by using a different literal:

julia> baz(x) = x * 1_000_000;

julia> @btime baz($(Ref(x))[])
  1.456 ns (0 allocations: 0 bytes)

tamasgal · May 25, 2020, 5:09pm

Yep, good catch. I switched to the eveals=1 since it has less noise but now I revise

DNF · May 25, 2020, 5:13pm

Here’s some more fun:

julia> foo(x) = x * 10^8;

julia> @code_native foo(1)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[95]:1 within `foo'
; │┌ @ REPL[95]:1 within `*'
	imulq	$100000000, %rdi, %rax  ## imm = 0x5F5E100
; │└
	retq
	nopl	(%rax,%rax)
; └

julia> foo(x) = x * 10^16;

julia> @code_native foo(1)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[97]:1 within `foo'
	movabsq	$10000000000000000, %rax ## imm = 0x2386F26FC10000
; │┌ @ int.jl:54 within `*'
	imulq	%rdi, %rax
; │└
	retq
	nop
; └

julia> foo(x) = x * 10^32;

julia> @code_native foo(1)   # ooooops. overflow
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[99]:1 within `foo'
	movabsq	$-8814407033341083648, %rax ## imm = 0x85ACEF8100000000
; │┌ @ int.jl:54 within `*'
	imulq	%rdi, %rax
; │└
	retq
	nop
; └

Mason · May 25, 2020, 6:05pm

Yeah, this is sometimes a little annoying, but one option you always have is to write a macro that just evals an expression inside the macro body and then never use it on expressions involving local variables and/or impure operations:

macro eval_at_parse_time(ex)
    eval(ex)
end

let x = Ref(4)
    f1(x) = x * 10^6
    f2(x) = x * 1_000_000
    f3(x) = x * @eval_at_parse_time 10^6

    @btime f1($x[])
    @btime f2($x[])
    @btime f3($x[])
end

3.359 ns (0 allocations: 0 bytes)
1.299 ns (0 allocations: 0 bytes)
1.299 ns (0 allocations: 0 bytes)

Topic		Replies	Views
How to force compile-time evaluation of literal calculations? Performance	4	824	March 9, 2021
Compile time exponentiation General Usage question	6	1038	May 20, 2019
No constant expression elimination for e.g. '2^24-1' Performance question	7	886	November 12, 2019
Exponentiation with literals in both base and exponent not removed by compiler in 1.8 Performance compilation , math	10	601	July 6, 2022
The big difference in speed of remainder calculations General Usage question	13	1118	April 13, 2020

Why isn't 10^6 evaluated at compile time?

Related topics