Why is a `isfoo(x) = x === foo` method defined for some singleton types?

Base and some libraries introduce a function to test for singleton types (ismissing, isnothing) that is basically an application of ===.

I am curious about the rationale for this API choice. (Not questioning how it is in Base, just trying to decide whether to follow it in my own code).

Just a consistent name with unambiguous intent. You could use === or x -> x === foo for the same effect if you didn’t do something crazy like reassign nothing/missing in a module. The problem was that people weren’t doing that, and they may instead use ==, which fails in the case of missing. x == nothing was also possibly worse for performance, though that section in the Performance Tips is no longer there as of an improvement to propagation in 1.7. x === nothing is still better for type inference than using the generic function isnothing(x) if x is uninferred (holds up in 1.12, though it’s worth noting that the compiler can leverage an inferred Union).

Doing that, in my opinion, is equally likely as reassigning (shadowing) isnothing etc, so it is not something I am considering in API design.

Indeed. Just assume that we are talking about a plain vanilla singleton type that does not redefine ==.

Thanks for your thoughts, it reinforces my belief that adding an isfoo method to my package API is superfluous at this stage of Julia if foo is exported.

(I understand that the methods in Base are here to stay, and the question is not about that, I was just wondering.)

Sample size of 1, but I generally wouldn’t mind if a package doesn’t provide an isfoo function to match a const foo. Nobody ever asked for a consistent ispi from Base, though it wouldn’t be hard to make a package for it. nothing and missing are a bit special because they’re in such a widespread dependency (Base), and checking and filtering would often be done with higher order functions. Routinely writing x -> x===nothing and a bunch of packages implementing their own isnothing functions are horribly redundant, and a package would probably have done isnothing if Base didn’t.

I used Common Lisp before Julia, and for me, writing x -> x ≡ foo is natural. I consider introducing a function for this gratuitous namespace pollution.

There is also the single-argument isequal.

Maybe it has something to do with the fact that === cannot be redefined, thus such a definition cannot be invalidated? Just a guess

If you write isolated x -> x===nothing, you’d be fine, but x -> x===nothing being different anonymous functions in different places needlessly complicates type inference and compilation. Sometimes it’s worth removing that redundancy at the cost of an extra const name. Those names can be optional, there’s a lot still in Missings.jl instead of Base.

Usable, but not the same as the builtin === or propagating it, [] === [] gives a different result from isequal([])([]).

More specifically it’s not a generic function with multiple changeable methods. Compilers can leverage that limitation, and it’s the reason for the better type inference in badly inferred code I mentioned earlier.

I am not quite sure how. Eg if I filter(x -> x === nothing, ...), my understanding is that Julia avoids specialization. At the relevant place it is probably inlined. Is this actually a practical concern?

Yes, I am aware that isequal is not equivalent to == (or === for that matter). But, again, the context is singleton types, so it just falls back to ==, which then falls back to ===.

The user should not worry about adding a tiny bit of extra work for the compiler if it leads to preferable code. That’s why we have a compiler.

Propagated inlining can be just as performant as method signatures that force specialization in the few exceptional cases. That’s not what I’m talking about though.

For one example, we can’t avoid redundant compilation for semantically distinct anonymous functions, no matter how smart the compiler gets. This following example compiles in a blink of an eye, but it’s not hard to imagine much longer compilation times that are not worth repeating.

julia> x = [1, nothing, 3];

julia> @time filter(!isnothing, x)
  0.103600 seconds (96.48 k allocations: 4.764 MiB, 99.96% compilation time)
2-element Vector{Union{Nothing, Int64}}:
 1
 3

julia> @time filter(!isnothing, x);
  0.000014 seconds (3 allocations: 144 bytes)

julia> @time filter(y->y!==nothing, x);
  0.018471 seconds (10.36 k allocations: 522.013 KiB, 99.58% compilation time)

julia> @time filter(y->y!==nothing, x);
  0.034947 seconds (10.36 k allocations: 521.841 KiB, 99.71% compilation time)

isequal does better because it doesn’t make a new function:

julia> @time filter(!isequal(nothing), x);
  0.031872 seconds (16.94 k allocations: 848.841 KiB, 99.83% compilation time)

julia> @time filter(!isequal(nothing), x);
  0.000010 seconds (3 allocations: 144 bytes)

Maybe the most generic way to handle an arbitrary 2-argument comparison is Base.Fix{2}(!==, nothing)?

1 Like

Sure, but there is no reason to recompile anonymous functions that were created using the same syntax. It’s just currently not implemented, but cf the discussion in #21113. Also, for the same callsite, compilation just happens once. Hardly a large cost IMO.

I recognize the current issues and limitations, but I still think there is some superstition about avoiding anonymous functions that is no longer justified.

Anonymous function syntax creates a new generic function, and we can add different methods to different generic functions as long as we have references, which doesn’t have to be a const name. That’s a very good reason to not automatically treat them as one function. Anonymous function syntax must undergo breaking changes to do so, or we must syntactically specify we are instead accessing a global cache of anonymous functions given a method body, like the suggested @cached macro in the discussion. RuntimeGeneratedFunctions.jl implements that with the caveat that closures are not allowed by the underlying @generated function.

Certainly, but in most applications in this context (especially inside functions) no such reference escapes. Julia has sophisticated escape analysis which should be able to figure this out. In which case even the trivial cost of compiling can be spared.

This is now only tangentially related to the original question, but again, I think there is a pervasive mythology surrounding closures in Julia. While in some cases they can be problematic, occasionally on this forum discussants recommend avoiding simple, functional style closures that should be innocuous.

1 Like

One reason might be that functions can be overloaded for custom types, and this is sometimes done for ismissing: JuliaHub (not too often though).

1 Like

I personally like having small functions like isnothing. Having worked on a number of codebases that don’t have any of these and instead use field access, direct computation etc. everywhere, it’s honestly harder to read, test & maintain.

Having these functions to me at least gives me vocabulary to talk about the code with colleagues, which is much more important than having “superfluous” oneliner definitions that are widely used in a codebase.

3 Likes

I don’t disagree, but let’s focus on singleton types, which is what this topic is about. You can compare them fully with ===. No field access or direct computation is needed.

Base has a lot of singleton types which have no is... accessor, eg the array iteration traits. You dispatch or compare them with === etc (equivalent when the computation happens in type space).

That is a valid use case, but then it about a more generic API, not just testing if something is a singleton type.

In any case, I think isnothing is a vestigial feature (that’s OK, now it is here to stay, but newer APIs do not need something like this — which answers my question), ismissing can be made a use case for when one generalizes the concept of “missingness”.

For ismissing, there was a PR to use ismissing everywhere in anticipation that there might, one day, be multiple missing types.

2 Likes

I suspect part of the reason for this is isnan. For that operation, none of == or isequal or === are what you want… because floating point. Now, yes, I know, that’s not your question, but many folks moved from using NaN sentinels to missings — be it Julia’s ismissing or another language/framwork’s isna or is.na. And they had it (rightly!) ingrained to never check == NaN. So checking === NA probably felt similarly wrong. It’s all about the feels! Once you have a function named isnan and get used to using filter(!isnan, _), it’s only natural to think that there should be an ismissing or isnothing, whether it’s actually necessary or not.

For Julia itself, there’s a question of whether you should use == or ===, and there have historically been some surprising performance effects to getting that choice wrong. And you can’t write ===(nothing).

Using a base function means that someone (probably multiple someones, actually) has thought about what the best implementation should be — and I don’t need to do that thinking!

2 Likes

Sure, but why should singleton types be more special here?Just for consistencies sake, I’d still introduce that oneliner. It’s not like we’re only looking at isnothing in isolation, otherwise I’d completely agree with you.

Not to mention, currently things like x -> x === Foo always introduce a new anonymous function, and always having to visually parse that this particular anonymous function is functionally equivalent to isfoo, for me at least, adds some overhead. Add up enough of that kind of small overhead and it’s harder than necessary to follow the intent of the code :person_shrugging:

I think that isnothing and ismissing are cases of Don’t Repeat Yourself, even though the repeated code is quite short. I also think they’re more readable.