Why does exporting a single Float64 from a function take as long as 2,000,000 addition steps

The joke was that if it weren’t for the compiler optimizing away the loop,

    for i in 1:1_000_000_000_000_000_000_000_000_000_000
        s += 1
    end

doing a thousand billion billion billion iterations would take a long time. It might not crash the computer, but it would keep a core busy!

I think the confusing thing is that the variant without easily overservable effects still has quadratic runtime. I’d call that a missed optimization opportunity:

julia> @code_native sum_arg(data);
	.text
; β”Œ @ REPL[12]:2 within `sum_arg'
	movq	%rsi, -8(%rsp)
	movq	(%rsi), %rax
; β”‚ @ REPL[12]:3 within `sum_arg'
; β”‚β”Œ @ sysimg.jl:18 within `getproperty'
	movq	(%rax), %rax
; β”‚β””
; β”‚β”Œ @ array.jl:705 within `iterate' @ array.jl:705
; β”‚β”‚β”Œ @ array.jl:199 within `length'
	movq	8(%rax), %rax
; β”‚β””β””
	cmpq	$2, %rax
	jl	L70
	movl	$2, %ecx
	nopw	(%rax,%rax)
L32:
	movl	$1, %edx
	nopw	%cs:(%rax,%rax)
; β”‚ @ REPL[12]:5 within `sum_arg'
; β”‚β”Œ @ array.jl:705 within `iterate'
; β”‚β”‚β”Œ @ int.jl:434 within `<' @ int.jl:427
L48:
	addq	$1, %rdx
	cmpq	%rax, %rdx
; β”‚β”‚β””
	jb	L48
; β”‚β”‚β”Œ @ int.jl:800 within `-' @ int.jl:52
	leaq	-1(%rcx), %rdx
; β”‚β””β””
; β”‚β”Œ @ int.jl:53 within `iterate'
	addq	$1, %rcx
; β”‚β””
; β”‚β”Œ @ array.jl:705 within `iterate'
; β”‚β”‚β”Œ @ int.jl:434 within `<' @ int.jl:427
	cmpq	%rax, %rdx
; β”‚β””β””
	jb	L32
L70:
	movabsq	$140360095817736, %rax  # imm = 0x7FA821A6E008
	retq
; β””

Look at the silly inner loop:

L48:
	addq	$1, %rdx
	cmpq	%rax, %rdx
	jb	L48

LLVM should be ashamed of emitting that!

Something noteworthy about β€œobservability”:

julia> p=pointer_from_objref(data.x);
julia> GC.enable(false);
julia> unsafe_store!(convert(Ptr{Int}, p), 0);
julia> sum_arg(data);
julia> data.x
1000-element Array{Float64,1}:

signal (11): Segmentation fault

This demonstrates that the compiler removed all memory accesses to the array, because the result is not needed.

1 Like