Why can't iteration figure out that all elements are identical in this case, and short-circuit the evaluation?

julia> struct ConstVector <: AbstractVector{Int}
           x :: Int
           len :: Int
       end

julia> Base.size(c::ConstVector) = (c.len,)

julia> @inline function Base.getindex(c::ConstVector, i::Int)
           @boundscheck checkbounds(c, i)
           c.x
       end

julia> @inline function Base.iterate(c::ConstVector, i=1)
           i > length(c) && return nothing
           return c.x, i+1
       end

julia> c = ConstVector(42, 100000000);

julia> using BenchmarkTools

julia> @btime any(iszero, $c);
  41.775 ms (0 allocations: 0 bytes)

Is there a way (short of specializing any and other reduction functions) to indicate that all elements are identical, and that the loop may be avoided in this case? Shouldn’t the specialized iterate be enough to declare this?

Be careful what you want to benchmark:

julia> f(x) = begin
           c = ConstVector(x, 1000000)
           any(iszero, c)
       end
f (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark f(i) setup=(i=rand(1:100))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.770 ns … 3.650 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     1.820 ns             β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.817 ns Β± 0.064 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

                                                β–ˆ         ▁  
  β–„β–β–β–β–β–β–β–β–β–ƒβ–‚β–β–β–β–β–β–β–β–„β–‚β–β–β–β–β–β–β–β–β–„β–β–β–β–β–β–β–β–β–ˆβ–‚β–β–β–β–β–β–β–β–ˆβ–ƒβ–β–β–β–β–β–β–β–β–ˆ β–‚
  1.77 ns        Histogram: frequency by time       1.83 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

My guess is that it can’t constant fold c (or its length) due to being a global variable.

julia> @code_native f(42)
	.text
	.file	"f"
	.globl	julia_f_570                     # -- Begin function julia_f_570
	.p2align	4, 0x90
	.type	julia_f_570,@function
julia_f_570:                            # @julia_f_570
; β”Œ @ REPL[5]:1 within `f`
	.cfi_startproc
# %bb.0:                                # %top
	#APP
	movq	%fs:0, %rax
	#NO_APP
	movq	-8(%rax), %rax
	testq	%rdi, %rdi
	movq	16(%rax), %rax
	#MEMBARRIER
	movq	16(%rax), %rax
	movq	(%rax), %rax
	sete	%al
	#MEMBARRIER
; β”‚ @ REPL[5]:3 within `f`
	retq
.Lfunc_end0:
	.size	julia_f_570, .Lfunc_end0-julia_f_570
	.cfi_endproc
; β””
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

julia> f(42)
false

julia> f(0)
true
2 Likes

Yeah, as long as the length can be constant-folded, the computation is optimized away.

Still, it would be nice if the optimization would work independently of the value of len.

I wonder if the compiler could do better here if the Julia 1.0 iteration protocol were better designed, without the unnecessary type instability.

This is completely unrelated to type instability, which doesn’t hurt here either way due to the small union optimization (TL;DR of which is dispatch on small unions is inlined). If that were the problem, pretty much no vectorization of any loop would be possible, at all.

1 Like

This is interesting, and it does seem to work within a function even if the length isn’t a constant:

julia> f(x, n) = begin
                  c = ConstVector(x, n)
                  any(iszero, c)
              end
f (generic function with 1 method)

julia> @btime f(42, 10000000);
  2.971 ns (0 allocations: 0 bytes)

That’s not testing what you think it’s testing. This is passing the values to f as constants in the benchmarking function. Instead:

julia> @btime f(42, 10000000);
  1.458 ns (0 allocations: 0 bytes)

julia> @btime f($42, $10000000);
  3.098 ms (0 allocations: 0 bytes)
6 Likes

Note that in your example the length is a constant. Use setup with rand to prevent constant-folding.

1 Like