Suggestion for more general/performant sentinel for find* functions

I’m concerned about the loss of type-stability in the find* etc. functions, changing everything to return nothing for the not found case.
While that is important for abstract arrays and associative structures, for strings, where there is a contract that AbstractStrings are always 1-based indexed, returning nothing instead of 0 just makes the code more difficult.
In my quick testing with some short strings and findfirst, the version using nothing is 3.3x slower.

Instead, why not add a function that returns the sentinel value needed for that particular type?

notfound(::Type{<:Any}) = nothing
notfound(::Type{<:AbstractString}) = 0
notfound(val::T) where {T} = notfound(T)

For arrays also, those could still define a sentinel value of zero, if they are not OffsetArrays, to get back the performance lost by the find* changes. Also, even for OffsetArrays, if typemin(Int) is not allowed as the first index, then that could be used as the sentinel, keeping the return values type-stable.

CC: @jeff.bezanson, @nalimilan

CC: @mbauman, @tim.holy

Note: I also think that seeing in code:

     ret = findfirst(a, b)
     ret == notfound(b) && ...

would be more readable than either the old way with 0, or the new way with nothing

2 Likes

Please do not ping multiple core contributors because you have language or API design opinions. Most of us are already active here and will read it in our own time.

I’m not terribly surprised there are I wouldn’t be terribly surprised if there were performance regressions here, but I expect this to improve. The compiler team is fully supportive of performant small unions, and I don’t see any reason why Julia couldn’t figure out concrete types with the idiom you proposed ret == nothing && …. I have a hunch that we’ll slowly start calling inferred small unions “type stable.”

Most helpful here would be the actual code that you benchmarked as being 3.3x slower.

3 Likes

I haven’t posted about it yet but having a fast find amd searchsortedfirst for strings is key to unlocking fast InternedString sorting

I pinged because this seemed pretty critical to me, and was surprised that a change with such large performance repercussions was merged without much more discussion, and because from other posts from Stefan understood that things might be really frozen even this week.

I understand perfectly why 0 is not a good sentinel for all cases, but that’s not a good reason to make a change that hurts performance in such a way, when other alternatives (such as my suggestion) are possible.

On v0.6.2:

julia> ff(z) = findfirst(z, "foobar") == 0
ff (generic function with 1 method)

julia> @btime ff('z')
  6.034 ns (0 allocations: 0 bytes)
false

julia> @code_native ff('z')
	.section	__TEXT,__text,regular,pure_instructions
Filename: REPL[19]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 1
	movabsq	$findnext, %rax
	movabsq	$4950739376, %rsi       ## imm = 0x1271649B0
	movl	$1, %edx
	callq	*%rax
	testq	%rax, %rax
	sete	%al
	popq	%rbp
	retq
	nopw	(%rax,%rax)

On master: (Version 0.7.0-DEV.3618 (2018-01-28 17:48 UTC), Commit 743d487 (0 days old master))

julia> ff(z) = findfirst(equalto(z), "foobar") == nothing
ff (generic function with 1 method)

julia> @btime ff('z')
  20.847 ns (0 allocations: 0 bytes)
true

julia> @code_native(ff('z'))
	.section	__TEXT,__text,regular,pure_instructions
; Function ff {
; Location: REPL[21]:1
; Function Type; {
; Location: REPL[21]:1
	pushq	%rbx
	subq	$32, %rsp
	movl	%edi, (%rsp)
;}
; Function findfirst; {
; Location: array.jl:1702
	movabsq	$findnext, %rax
	leaq	24(%rsp), %rbx
	leaq	(%rsp), %rsi
	movabsq	$4677124368, %rdx       ## imm = 0x116C74110
	movl	$1, %ecx
	movq	%rbx, %rdi
	callq	*%rax
;}
	movl	%edx, %ecx
	andb	$127, %cl
	cmpb	$2, %cl
	je	L76
	cmpb	$1, %cl
	jne	L105
	movabsq	$"==", %rax
	callq	*%rax
	jmp	L97
; Function findfirst; {
; Location: array.jl:1702
L76:
	testb	%dl, %dl
	cmovnsq	%rbx, %rax
;}
	movq	(%rax), %rdi
	movabsq	$"==", %rax
	callq	*%rax
L97:
	andb	$1, %al
	addq	$32, %rsp
	popq	%rbx
	retq
L105:
	movabsq	$jl_system_image_data, %rax
	movq	%rax, 8(%rsp)
	movabsq	$jl_system_image_data, %rax
	movq	%rax, 16(%rsp)
	movabsq	$jl_apply_generic, %rax
	leaq	8(%rsp), %rdi
	movl	$2, %esi
	callq	*%rax
	ud2
	nop
;}

That’s also a pretty large increase in code size.

You’re testing something nonsensical. The real tests are:

0.6:

julia> ff(z) = findfirst(x->x==z, "foobar") == 0
ff (generic function with 1 method)

julia> @btime ff($('z'))
  80.433 ns (0 allocations: 0 bytes)
true

0.7:

julia> ff(z) = findfirst(equalto(z), "foobar") == nothing
ff (generic function with 1 method)

julia> @btime ff($('z'))
  39.565 ns (0 allocations: 0 bytes)
true

You saw the message before I fixed an incorrect copy/paste.

It’s still an incorrect usage of the API. You want findfirst(x->x==z, "foobar") or findfirst("foobar", z) on 0.6.

Ah, forgot about the other breaking change to the order of arguments, I’ll retest when I get home.

Yesterday while working on migrating a package to 0.7 I also had bad surprises with this

julia> typeof(findfirst(equalto('a'), "abc"))
Int64

julia> typeof(findfirst(equalto('d'), "abc"))
Nothing

julia> first(findfirst(equalto('d'), "abc"))
ERROR: MethodError: no method matching start(::Nothing)
Closest candidates are:
  start(::SimpleVector) at essentials.jl:550
  start(::Base.MethodList) at reflection.jl:659
  start(::ExponentialBackOff) at error.jl:171
  ...
Stacktrace:
 [1] first(::Nothing) at .\abstractarray.jl:198
 [2] top-level scope

OK, a simpler example, on v0.6:

julia> @btime in($('z'), $("foobar"))
  12.568 ns (0 allocations: 0 bytes)
false
julia> @btime in($('f'), $("foobar"))
  12.206 ns (0 allocations: 0 bytes)
true

On v0.7:

julia> @btime in($('z'), $("foobar"))
  18.075 ns (0 allocations: 0 bytes)
false
julia> @btime in($('f'), $("foobar"))
  17.575 ns (0 allocations: 0 bytes)
true

So, not as bad as I’d originally thought, but still quite bad, around 44% slower.

I also find it interesting that the string code ends up wrapping things like this:

nothing_sentinel(i) = i == 0 ? nothing : i

The definition of in is actually very similar to the function I was benchmarking.

in(c::Char, s::AbstractString) = (findfirst(equalto(c),s)!==nothing)

A 44% percent penalty sounds perfectly reasonable as a temporary situation until the compiler improves, given that the API is simpler and more consistent by using nothing everywhere to represent the absence of a match.

The nothing_sentinel trick is only used 7 times, all in the same file. It’s also not strictly needed and is mostly due to the fact that the old code was designed around the assumption that 0 indicated no match. Since we’re short on time we kept it that way in order to get it merged quickly (see discussion on the PR.

1 Like

The problem is that it will not be possible to remove the penalty completely, and it may be worse that than, in cases where the compiler will be unable to remove it.
Think of the case of creating a vector of the offsets of the first (or last, or whatever) occurrence of a substring in a set of strings, the compiler will never be able to remove the penalty of that case.

Just as you need to use zero(T) in places to make things type-stable, with my suggestion you just need to use notfound(T). What happens if nothing itself can be a member of the collection? It seems like the method of having a fixed return value of nothing doesn’t work in that case, but notfound(T) would work just fine.

And what does notfound(Any) return?

You could always add other arguments to notfound, to specialize even further, for those sorts of edge cases,
For example, Dict{Any, Any} might return a type that is specifically not allowed in it, a NotFound() type.

Another possible approach would be to have a pair of functions, that operate on the return value,
found(T, ret) could return a boolean if it were found or not, and getloc(T, ret) would return the value.

For the vast majority of use cases (strings and 1-based arrays), the functions would be simply as follows:
found(::AbstractString, ret) = ret != 0, and getloc(::AbstractString, ret) = ret
and there would be zero performance penalty.

For cases where T is some collection of Any, the return value could be a tuple (foundflag, val), so that (false, nothing) would indicate that the value was not found, and (true, nothing), would indicate that the value was found, and was nothing.

That approach would always work, whether or not an in-band sentinel value was possible for the type T.

Again, I ask for benchmarks here. I’m not totally convinced that the performance regression in in is entirely related to Union splitting, particularly given that other code shows even larger speedups (which are also likely unrelated).

julia> using BenchmarkTools

julia> @btime in($('z'), $("foobar"))
  37.067 ns (0 allocations: 0 bytes)

julia> @eval Base begin
       # A poor man's "revert" — it leaves Base horribly broken, but it's a small test
       function findnext(pred::EqualTo{Char}, s::String, i::Integer)
           if i < 1 || i > sizeof(s)
               i == sizeof(s) + 1 && return 0
               throw(BoundsError(s, i))
           end
           @inbounds isvalid(s, i) || string_index_err(s, i)
           c = pred.x
           c ≤ '\x7f' && return _search(s, c % UInt8, i)
           while true
               i = _search(s, first_utf8_byte(c), i)
               i == 0 && return 0
               s[i] == c && return i
               i = next(s, i)[2]
           end
       end
       in(c::Char, s::AbstractString) = (findfirst(equalto(c),s)!==0)
       end

julia> @btime in($('z'), $("foobar"))
  35.578 ns (0 allocations: 0 bytes)
false

You could almost imagine foundflag being a switch that says whether val is an Int or Void. You could even call it a “type tag”. What’s this starting to sound like?

Yes, storing a Vector{Union{Int,Void}} is going to take up more space than just a Vector{Int}, but it’d be comparable to storing a Vector{Tuple{Bool,Int}}.

1 Like

You seemed to have missed that I said that would only be for collections where nothing was a member of T for the collection of T.
In the vast majority of cases, you’d just be storing things of type T (strings, 1-based arrays, etc.), and that will take up substantially less space.
(as well as make doing things like counting the # of sentinel values in the result vector much faster, something that could be SIMD optimized easily).

So your proposal here would be to sometimes return an in-band element of the type of the indices, and sometimes to return a Tuple{Bool, X}? If I wanted to write generic code, I’d have to check to see if I got back a tuple? What about Dict((false, 1) => 1, 1=>2)? Yes, the status quo means you cannot find Dict(nothing=>3), but there is a virtue to simplicity.

Please read what I proposed above.
I proposed to have two functions, with methods based on the type of the collections.
You would never explicitly checking for a tuple, and things would be type stable.

For example:

    found(a, ret = findfirst(needle, a)) || return do_something_if_not_found()
    val = getval(ret)

For strings and 1-based arrays, ret would be simply an Int, found would check for != 0, and getval would be a no-op.

Very simple. Presumably the type of this Dict is of type Any?
If it finds the first index, it would return (true, (false, 1)), the second, it would return (true, 1), and if not found, it would return (false, nothing).

I wouldn’t consider that a virtue at all, that would break our code, where we were using nothing to indicate a JSON null, and at times used a Dict to count unique values in the JSON structure.
With my approach, a key of nothing would simply be returned as (true, nothing).