Shouldn't `findfirst` propagate `missing` instead of erroring?

The whole idea of missing vs nothing is that the first one should propagate but not erroring, but then:

a = findfirst(x -> x == 1, [1,0,missing, 1]) # out: 1, ok
b = findfirst(x -> x == 1, [0,0,missing, 1]) # out: error, should be `missing`
c = findfirst(x -> x == 1, [0,0,missing, 0]) # out: error, should be `missing`

You need to use skipmissing to get that behavior.

julia> findfirst(x -> x == 1, skipmissing([0,0,missing, 0]))

julia> findfirst(x -> x == 1, skipmissing([0,0,missing, 1]))
4

Yes, sometimes it is confusing.

== does not work nicely with missing, but you can use isequal, that works as you could expect:

You could do

a = findfirst(x->isequal(x, 1), [1, 0, missing, 1]) # out: 1, ok
b = findfirst(x->isequal(x, 1), [0, 0, missing, 1]) # out: 4, ok
c = findfirst(x->isequal(x, 1), [0, 0, missing, 0]) # out: nothing

or even:

a = findfirst(isequal(1), [1, 0, missing, 1])
...
1 Like

Thank you. However both solutions return nothing with [0,0,missing,0], while I think missing should be more appropriate, as we don’t know the 3rd value.

This “manual version” of findfirst works but it is ugly and possibly very slow:

function findfirst_custom(f,x)
    for i in 1:length(x)
        if ismissing(x[i])
            return missing
        elseif f(x[i]) == true
            return i
        end
    end
    return nothing
end

EDIT: not too slower, at least for these simple cases:

small = [0,0,0,1]
long  = append!(fill(0,100000),[1,0])
@btime findfirst(x -> x == 1, $small)         # 4.864 ns (0 allocations: 0 bytes)
@btime findfirst_custom(x -> x == 1, $small)  # 4.873 ns (0 allocations: 0 bytes)
@btime findfirst(x -> x == 1, $long)          # 77.870 μs (0 allocations: 0 bytes)
@btime findfirst_custom(x -> x == 1, $long)   # 89.216 μs (0 allocations: 0 bytes)

Why do you think it would be slow? It looks close to optimal to me, except you should of course not use 1:length(x) but pairs.

(Also, it is redundant to test for == true).

1 Like

Following the suggestions from @DNF,

function findfirst_custom(f,x)
          for (i,val) in pairs(x)
               if ismissing(val)
                   return missing
               elseif f(val)
                   return i
               end
           end
           return nothing
       end

In my computer it takes not slower than findfirst, test it.

1 Like

yes. Any how, it would be enough to add some ismissing(A[i]) && return missing to some functions (findnext…) in [⋅]/base/array.jl

Would it make sense to you? (moving to Internal & Design)

1 Like

As I understand, missing should propagate to the same type/data (conceptually). findfirst isn’t returning an element of the input array, but the index of an element in the input array.

b = findfirst(x -> x == 1, [0,0,missing, 1]) producing an error has the fewest assumptions about what the correct behavior is (and allows the user to specify their desired behavior/solution, e.g. skipmissing or an isequal predicate). The documented behavior of findfirst is to return a valid index/key or nothing if no element matches the predicate. A result of missing wouldn’t make sense to me, as an index exists or does not exist; there is no such thing as a missing index, IMO.

1 Like

I disagree on your last sentence :slight_smile: : an index exists, it doesn’t or I don’t know if it does exists (or which is it, when missing comes before a value for which I have true in the inner function)

1 Like