Operations on missing values

In Julia 0.7 what is the consensus about the best practice for handling missing arguments in combination with functions that do not accept missing values by design?

What I mean is that currently we have for example:

julia> x= ["A", "b", "AB", missing]
4-element Array{Union{Missing, String},1}:
 "A"
 "b"
 "AB"
 missing

julia> lowercase.(x)
ERROR: MethodError: no method matching lowercase(::Missing)

so what you have to do is:

broadcast(v-> ismissing(v) ? missing : lowercase(v), x)

if I remember there were earlier discussions about this but I could not find the conclusion. What essentially I am looking for is a function similar to skipmissing but passing them through like (probably a shorter name would be nice as this function would be heavily used):

passmissing(f, x) = ismissing(x) ? missing : f(x) # maybe this special case is not needed if the later is fast for f accepting a single argument
passmissing(f, x...) = any(ismissing, x) ? missing : f(x...)

and now you can write passmissing.(lowercase, x) to get the desired result.

4 Likes

With Missing defined in Base there really should be a generic fallback I guess.

Maybe Missings could provide some lift(f) that does:

lift(f)(::Missing) = missing
lift(f)(x) = f(x)

and maybe some @lift macro that would lift all functions in a code block, though the details on the macro side may be a bit tricky to figure out exactly.

Yeah, that’s another possibility, so something like:

lift(f) = (x...) -> any(ismissing, x) ? missing : f(x...)

Though lift(f)(x) doesn’t look great compared with lift(f, x). Same for lift(f).(X) vs. lift.(f, X).

A @lift macro would be nice, at least it would allow keeping normal syntax after it. It shouldn’t be too hard to support as you just need to find call nodes and adapt them.

This may be a very technical point, but I was thinking that one could use a generated function (rather than anonymous function) so that you could check if any of the arguments is missing at compile time, not sure how much it matters. Maybe it’s good in some cases (say few arguments) and bad in others.

Yes, we would have to be very careful about performance when writing the actual implementation. Thanks to the any recursive definition for small tuples, the simple definition could be quite fast, but maybe splatting is too costly and we’d need either a generated function or special cases for small numbers of arguments.

1 Like
  1. My thinking was that specialized cases for small tuples might be needed but this requires benchmarking (a PR for a similar case is open here https://github.com/JuliaLang/julia/pull/16604 with different options compared).
  2. I was also considering a macro; then we have to take into account that some functions have handle missing argument and this should not be overridden.
  3. I prefer lift(f,x) syntax as it is shorter and allows do syntax. What advantages of lift(f)(x) over lift(f,x) do you see?

and I guess everyone accepts lift as a name, which is great as it is short which is important in this case I think.

I thought the whole point of missing was that it was supposed to propagate automatically? Shouldn’t we just get lowercase(::Missing) = missing? Lift might be good for lowercase(::Nothing)

At least for now we would have to define this behavior for all functions that should support it (and the consensus is that for functions operating on strings a PR adding such support should be done).

However, as I have noted above - in general not all should be lifted automatically as some functions might want to accept missing as a valid argument. Consider for example vcat(missing) it returns 1-element array containing missing (as expected) and not missing. The same might be with user defined functions.

But I agree that unfortunately this is a bit awkward.