Fastest way to check for Inf or NaN in an array?

This has been haunting me so I finally installed LoopVectorization.jl. It’s a clear winner over my hand-written allfinite above:

using LoopVectorization
function allfinite_turbo(x)
	z = zero(eltype(x))
	s = z
	@turbo for i in eachindex(x)
		s = muladd(z, x[i], s)
	end
	return s == z
end

using BenchmarkTools
x = randn(1000);
@btime allfinite($x) # 85ns
@btime allfinite_turbo($x) # 41ns

I will strongly advocate for the LoopVectorization.@turbo version. It’s so much easier to read as source code and it managed a further 2x speedup over my initial allfinite version with manual unrolling.

I looked into the @code_native to see what it did differently. It opted for 32x unrolling (8x instructions * 4x SIMD) rather than my 16, but otherwise the primary loop is identical. But it appears to have done better at combining the accumulators after that. Then it resolves the tail first with some 8x unrolling and then (as far as I can tell) finally resolves the tail’s tail (length in 0:7) with some manual branching (rather than looping, for ~log2(8) branches instead of up to 8).

Simply changing my original version to 32x unrolling did not significantly improve the runtime at this input length. Despite them both using the same core loop, a lot was lost in reducing the accumulators and in my lazy 1-at-a-time tail handling. I could change my source to emulate the LoopVectorization version, but I wouldn’t do any better and it did it all without my involvement and without making the source code a total mess.

9 Likes