Why can't iteration figure out that all elements are identical in this case, and short-circuit the evaluation?

jishnub · January 3, 2023, 1:03pm

julia> struct ConstVector <: AbstractVector{Int}
           x :: Int
           len :: Int
       end

julia> Base.size(c::ConstVector) = (c.len,)

julia> @inline function Base.getindex(c::ConstVector, i::Int)
           @boundscheck checkbounds(c, i)
           c.x
       end

julia> @inline function Base.iterate(c::ConstVector, i=1)
           i > length(c) && return nothing
           return c.x, i+1
       end

julia> c = ConstVector(42, 100000000);

julia> using BenchmarkTools

julia> @btime any(iszero, $c);
  41.775 ms (0 allocations: 0 bytes)

Is there a way (short of specializing any and other reduction functions) to indicate that all elements are identical, and that the loop may be avoided in this case? Shouldn’t the specialized iterate be enough to declare this?

Sukera · January 3, 2023, 3:57pm

Be careful what you want to benchmark:

julia> f(x) = begin
           c = ConstVector(x, 1000000)
           any(iszero, c)
       end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark f(i) setup=(i=rand(1:100))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.770 ns … 3.650 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.820 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.817 ns ± 0.064 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                █         ▁  
  ▄▁▁▁▁▁▁▁▁▃▂▁▁▁▁▁▁▁▄▂▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁█▂▁▁▁▁▁▁▁█▃▁▁▁▁▁▁▁▁█ ▂
  1.77 ns        Histogram: frequency by time       1.83 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

My guess is that it can’t constant fold c (or its length) due to being a global variable.

julia> @code_native f(42)
	.text
	.file	"f"
	.globl	julia_f_570                     # -- Begin function julia_f_570
	.p2align	4, 0x90
	.type	julia_f_570,@function
julia_f_570:                            # @julia_f_570
; ┌ @ REPL[5]:1 within `f`
	.cfi_startproc
# %bb.0:                                # %top
	#APP
	movq	%fs:0, %rax
	#NO_APP
	movq	-8(%rax), %rax
	testq	%rdi, %rdi
	movq	16(%rax), %rax
	#MEMBARRIER
	movq	16(%rax), %rax
	movq	(%rax), %rax
	sete	%al
	#MEMBARRIER
; │ @ REPL[5]:3 within `f`
	retq
.Lfunc_end0:
	.size	julia_f_570, .Lfunc_end0-julia_f_570
	.cfi_endproc
; └
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

julia> f(42)
false

julia> f(0)
true

nsajko · January 3, 2023, 4:32pm

Yeah, as long as the length can be constant-folded, the computation is optimized away.

Still, it would be nice if the optimization would work independently of the value of len.

I wonder if the compiler could do better here if the Julia 1.0 iteration protocol were better designed, without the unnecessary type instability.

Sukera · January 3, 2023, 5:29pm

This is completely unrelated to type instability, which doesn’t hurt here either way due to the small union optimization (TL;DR of which is dispatch on small unions is inlined). If that were the problem, pretty much no vectorization of any loop would be possible, at all.

jishnub · January 3, 2023, 6:00pm

This is interesting, and it does seem to work within a function even if the length isn’t a constant:

julia> f(x, n) = begin
                  c = ConstVector(x, n)
                  any(iszero, c)
              end
f (generic function with 1 method)

julia> @btime f(42, 10000000);
  2.971 ns (0 allocations: 0 bytes)

mbauman · January 3, 2023, 6:05pm

That’s not testing what you think it’s testing. This is passing the values to f as constants in the benchmarking function. Instead:

julia> @btime f(42, 10000000);
  1.458 ns (0 allocations: 0 bytes)

julia> @btime f($42, $10000000);
  3.098 ms (0 allocations: 0 bytes)

nsajko · January 3, 2023, 6:06pm

Note that in your example the length is a constant. Use setup with rand to prevent constant-folding.

Topic		Replies	Views
Detect if constant General Usage constant-propagation	49	1719	June 16, 2023
Efficient iteration with fixed size array types without `@generated` functions Performance	9	201	September 26, 2024
Fastest way to check if condition is true General Usage	29	4037	December 12, 2019
Optimising function with const instead of struct? Performance	9	747	July 7, 2018
Compiler optimizations with broadcast or map Performance compilation	5	2493	June 27, 2020

Why can't iteration figure out that all elements are identical in this case, and short-circuit the evaluation?

Related topics