# Findfirst for Dicts with `nothing` keys

Is there plan to fix this function?

``````julia> d=Dict(nothing=>1)

julia> z = findfirst(x->x==1, d)  # z === nothing !

julia> d[z]  # but nothing is correct key
1

julia> z = findfirst(x->x==0, Dict(nothing=>1))  # or could be incorrect key!

``````

One possible solution could be return vector like `findall`… (maybe I am wrong but think that compiler could make it without performance penalty)

(BTW it could also make from `isempty` universal test for success)

1 Like

I’m gonna channel Steve Jobs here: Don’t use `nothing` to represent something?

We’ve decided to go with a sentinel implementation here. You’ll never be able to find that sentinel, no matter what it is. Yes, it’s a tiny limitation in its generality, but I don’t find it horrible.

1 Like

It is horrible. It is not about limitation it is about consistency.

If you insist on special status of nothing then proper solution would be make something like `Dict(nothing=>1)` strictly forbidden. (means throwing exception). It would be consistent.

Current situation is bad. Suppose to have R big amount of results from measurement. You would like to collect counts for result values. Dict is natural representation of this collection. And than you could ask if in R is any value more than x times:

``````julia> R = [5,5,5,nothing,nothing,nothing];

julia> d = Dict([[value, count(i->i==value, R)] for value in R])
Dict{Union{Nothing, Int64},Int64} with 2 entries:
nothing => 3
5       => 3

julia> findfirst(i->i>2, d)  # return nothing
``````

You could say that this is coders mistake and they have to know and remember that nothing is special…

I don’t buy it! It would be (if we don’t fix this) language property to support these type of bugs.

For some inspiration what we could cause we could look at List of software bugs - Wikipedia

By returning Vector we could avoid this problems.

``````[]  # we could return nothing
[nothing]  # we could return `nothing` too
[[]]  # we have no problem with empty vector
``````

How is `nothing` the result of a measurement? In that particular example I would use `missing`.

1 Like

The current behaviors of `nothing` and `missing` are the result of extensive (by which I mean, truly ad nauseum) discussion and development experience both within the Julia data science community and core Julia. As far as I know none of the widely used packages including DataFrames, CategoricalArrays, IndexedTables, all of the Anthoff-verse, nor any of the other JuliaData, JuliaStats or machine learning stuff would create an array like the `R` in your snippet above, so I think it’s safe to say that this is a standard that has been gelling for some time now. The fact that `nothing` and `missing` have “special” behavior is extremely deliberate. The current behavior of `findfirst` allows you to dispatch on code which returns `Nothing`, and I expect this sort of behavior to standardize as well, if it hasn’t to some extent already.

Just as you woulndn’t consider it reasonable to use `NULL` as a stand-in for just-any-old-thing in C or C++, you should not use `nothing` this way either.

1 Like

We actually did use `nothing` as a dictionary key, when processing JSON files (i.e. for JSON `null` values).

Another way of handling this, that would allow using anything (including `nothing`) as a key, would be to have different types of collection use different values for different types of collections.

For collections that can use an “in-band” sentinel (such as 0 for `AbstractString` and `Vector` [not necessarily `AbstractVector` though, since it could be an `OffsetVector` or a 0-based vector]), use that, otherwise, return a simple tuple, `(true, value)` or `(false, emptyvalue)` (where emptyvalue just needs to be something that is one of the types that the elements of the collection could be, it’s never looked at). That could be hidden by two functions, `found(collectionortype, result)` and `find_result(collectionortype, result)`.

I believe the code generated for is as fast or faster on current master than using `nothing` as a sentinel, and is more general.

Weeell! It help me to internally accept that nothing as key will throw error.

I think I understand why ad nauseum! This really complicate language (at least from user perspective) . (I say nothing about necessity or usefulness now! I am open to accept this complication but need more time to thinking)

You probably don’t know what NULL in C++ represents? 0 is probably most used key in C++!
Look for example at: Using NULL in C++? - Stack Overflow

You mean during processing? Because names in object are limited to strings ( https://json.org/ )

And BTW wouldn’t be better to use `missing` here (if `nothing` has special language meta meaning) ?

``````julia> JSON.parse("""[null]""")
1-element Array{Any,1}:
nothing
``````

In this case, we were keeping track of occurrences of values, `null`, `false`, `true`, etc., so the value itself was used as the key in a `Dict`. (although we ran into a different problem there, because `Dict` considers the keys `false` and `0` to be the same (and `true` and `1`), something we hadn’t at all expected, because `if 1` gives an error.

This was something from almost 2.5 years ago (before v0.4 was even released), `missing` is very recent.
Also, `missing` has a lot of special meaning, with 3VL. (`missing | true` => `true`, but `missing & true` => `missing`).

Since any value could be a dictionary key (including `missing` or an array), there is nothing (no pun intended) that `findfirst` could return on failure that couldn’t collide with a key. You’d have to throw an exception instead, which is conceptually fine but would be much slower.

In the rare (but not impossible) case of dictionaries that may have `nothing` as a key and you need linear search, the best alternative is probably just to write your own loop, which is not any slower than `findfirst` and takes 4 lines instead of 1.

``````for v in values(d) # or (k,v) in pairs(d)
v == 1 && return something
end
return somethingelse
``````
2 Likes

In the spirit of #23642, IMO the cleanest approach for a custom function would be to do it without sentinel values at all, ie returning a `Union{Some{T}, Nothing}`.

I’m not so sure. Returning non-bitstype tuples is currently pretty hit-and-miss with respect to avoiding the allocation for the tuple.

Once this https://discourse.julialang.org/t/immutables-with-reference-fields-why-boxed/7706 is fixed (if ever) I totally agree that sentinel values are stupid-- as far as I understand, we should have several free (already invalidated) registers where we can place our bool, and the actual return value can go via sret / other register / whatever the current ABI wants.

Maybe an easier clean fix would be to special case that `Tuple{atype, btype}` with inferred `bitstype` never allocates. Of course, unless this is packed into a tuple / array / etc, or this rule would need to be applied recursively. In other words: I am asking for a special ABI for returning pairs. I would even be happy if I was forced to annotate that I want `@pair_return_ABI` or there was a magic built-in `__unpacked_pair` that lazily materializes the tuple.

Really unbelievable! I think it is bug.

Or maybe we could at least change `findfirst` to something like `findkey(value, dict, sentinel=nothing)`? (to be universal and to help avoid bug with collision `nothing`)

Sorry, I should have said `null_ptr` or whatever is the modern, type-correct version of `NULL` (it’s been a while for me). Even so, if you are reading C++ code you would not expect to see `NULL` as a key, you would expect to see `0`.

I still don’t see how `Nothing` having special behavior complicates the language. Different types behave differently, it’s a core concept of programming. Should you expect `sin(nothing)` to attempt to return something? If you think you need another type that behaves like `NaN`, we have `missing`.

I think the problem is that it has been overloaded with a number of different meanings.

1. for tryparse, it means that it was unable to parse the string (but unfortunately, it can’t also return any useful information about why it was unable to parse the string, for that, you have to raise an error)
2. to indicate that nothing was returned from a method, which is used to indicate that there is no output (and people have rightly complained that when `nothing` is being used to indicate something meaningful, it is a pain to have it not printed out)
3. to indicate that something was not found in some search functions. Note: it’s still very inconsistent, because if you are searching for a regex or string, you get a range, which you need to test to see if it’s empty, but if it is a predicate, (`equalto` or `occursin`) it is either an index or `nothing`. The `searchsorted*` functions generally return a range, even if not found, because you want to get back the insertion location, if not found.

I think that cases 1 and 3 should use different types, for `tryparse`, a type that can give information about why the parse failed (maybe simple return the `Exception` type, instead of `throw`ing it), and for `find*` that don’t return a range, a `notfound` instance of a `NotFound` type (if a singleton must be used).

The hard part about this is designing the API. There is no reason why it can’t / shouldn’t be done in a library (using a different function, of course).

It’s intentional behavior, not a bug. `true == 1` and `isequal(true, 1)` and `Bool <: Integer` are all true in Julia.

There have been lots of discussions about whether it is desirable to treat `Bool` values as integers. See e.g. Should true == 1? · Issue #15983 · JuliaLang/julia · GitHub, make Bool <: Unsigned (i.e. UInt1) · Issue #19168 · JuliaLang/julia · GitHub, Introduce `UInt1` type to replace misuse of `Bool` · Issue #18367 · JuliaLang/julia · GitHub

I have no problem with special behavior. I am just saying that if it has to have special behavior it need to be consistently applied.

So in case of `findfirst` I expect that we:
a) fix function and return vector like findall do
or
b) forbid `nothing` as Dict key
or
c) throw exception in case that `nothing` key is proper return value
or
something other which I don’t think is useful to search without consensus that current status is wrong.

I’m certainly on-board with making whatever (consistent) changes are needed to make this transparent including forbidding setting `nothing` as a `Dict` key. I think this should be as simple as

``````setindex!(a::AbstractDict, ::Nothing, idx) = throw(ArgumentError("dude, don't do that"))
``````

which should have no performance consequences.

I’m not quite sure what you mean by “throw exception in case that `nothing` key is proper return value”. I don’t see why `findfirst` should return an `AbstractVector`, that seems totally unnecessary.