Why does exporting a single Float64 from a function take as long as 2,000,000 addition steps

Elrod · May 9, 2019, 12:58am

The joke was that if it weren’t for the compiler optimizing away the loop,

    for i in 1:1_000_000_000_000_000_000_000_000_000_000
        s += 1
    end

doing a thousand billion billion billion iterations would take a long time. It might not crash the computer, but it would keep a core busy!

foobar_lv2 · May 9, 2019, 9:04am

I think the confusing thing is that the variant without easily overservable effects still has quadratic runtime. I’d call that a missed optimization opportunity:

julia> @code_native sum_arg(data);
	.text
; ┌ @ REPL[12]:2 within `sum_arg'
	movq	%rsi, -8(%rsp)
	movq	(%rsi), %rax
; │ @ REPL[12]:3 within `sum_arg'
; │┌ @ sysimg.jl:18 within `getproperty'
	movq	(%rax), %rax
; │└
; │┌ @ array.jl:705 within `iterate' @ array.jl:705
; ││┌ @ array.jl:199 within `length'
	movq	8(%rax), %rax
; │└└
	cmpq	$2, %rax
	jl	L70
	movl	$2, %ecx
	nopw	(%rax,%rax)
L32:
	movl	$1, %edx
	nopw	%cs:(%rax,%rax)
; │ @ REPL[12]:5 within `sum_arg'
; │┌ @ array.jl:705 within `iterate'
; ││┌ @ int.jl:434 within `<' @ int.jl:427
L48:
	addq	$1, %rdx
	cmpq	%rax, %rdx
; ││└
	jb	L48
; ││┌ @ int.jl:800 within `-' @ int.jl:52
	leaq	-1(%rcx), %rdx
; │└└
; │┌ @ int.jl:53 within `iterate'
	addq	$1, %rcx
; │└
; │┌ @ array.jl:705 within `iterate'
; ││┌ @ int.jl:434 within `<' @ int.jl:427
	cmpq	%rax, %rdx
; │└└
	jb	L32
L70:
	movabsq	$140360095817736, %rax  # imm = 0x7FA821A6E008
	retq
; └

Look at the silly inner loop:

L48:
	addq	$1, %rdx
	cmpq	%rax, %rdx
	jb	L48

LLVM should be ashamed of emitting that!

Something noteworthy about “observability”:

julia> p=pointer_from_objref(data.x);
julia> GC.enable(false);
julia> unsafe_store!(convert(Ptr{Int}, p), 0);
julia> sum_arg(data);
julia> data.x
1000-element Array{Float64,1}:

signal (11): Segmentation fault

This demonstrates that the compiler removed all memory accesses to the array, because the result is not needed.

Topic		Replies	Views
Shouldn't 1.8.0 be faster than Julia 1.7? Performance	30	2544	September 16, 2022
Compare julia sum to a cpp implementation - julia is extremely slow?! Performance question	35	1795	October 7, 2019
Why the function sum1 is faster than builtin sum Performance question , profiling	14	1041	September 21, 2023
Huge performance improvement by separating function? General Usage	15	1155	August 27, 2018
Compiler optimization for variables and functions Performance	16	1424	September 5, 2018

Why does exporting a single Float64 from a function take as long as 2,000,000 addition steps

Related topics