Missing of a certain data type

Hello,

I have just read the documentation on missing values and while I can’t comment on performance issues I have found it very intuive. The documentation on missing starts as:

Julia provides support for representing missing values in the statistical sense, that is for situations where no value is available for a variable in an observation, but a valid value theoretically exists.

Propagation on mathematical operations, behavior in equality, comparison and logical operators are a natural extension of this definition and they make intuitive sense. My question is this: The result of the following operation 0 * missing is missing. This also makes sense if the missing object has a theoretical value of any data type. What if we have a missing value yet theoretically we know it should be a real or integer number. Then the result of the above operation should be 0. Is there any way to impose a data type such as Float64 on the missing object? If not is this a valid and intuitive request?

For example a missing(Float64) object should behave as the following in these two operations: missing(Float64) * 0 should yield 0, yet missing(Float64) * 1 should yield missing.

Thanks.

Among other things, this would violate type stability. But that’s no longer that big of a deal,
and there is no single best concept for missingness, the one used in Base is just simple and consistent. So you could define your own type for this, eg

struct MissingNumber end
const missing_number = MissingNumber()
Base.:*(::MissingNumber, x::Number) = iszero(x) ? x : missing_number
Base.show(io::IO, ::MissingNumber) = print(io, "missing_number")

julia> missing_number * 1
missing_number

julia> missing_number * 0
0
1 Like

Note that 0 * x isn’t guaranteed to return zero when x is a Float64. For example, 0 * Inf gives NaN.

3 Likes

It’s a bit off-topic, but was it every considered to have Missing{T}? Then one could give more satisfying answers to:

julia> occursin(missing, "a")
ERROR: MethodError: no method matching occursin(::Missing, ::String)
Stacktrace:
 [1] top-level scope at none:0

julia> zero(missing)
missing

julia> length(missing)
ERROR: MethodError: no method matching length(::Missing)
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:582
  length(::Base.MethodList) at reflection.jl:732
  length(::Core.MethodTable) at reflection.jl:806
  ...
Stacktrace:
 [1] top-level scope at none:0

It’s occasionally a pain that f.(v) calls f(::Missing), which loses the type information in v. I get that it might not be worth the extra complexity though…

Yes, it’s been discussed a lot (like probably all issues regarding missing values). The issue it raises has even been dubbed the “counterfactual return type problem” by John Myles White. Basically, in many situations, a function which wants to return a Missing{T} value is not able to find out what T is when the value is missing without relying on type inference. But it’s generally not considered a good practice to rely on inference for user-visible behavior, since it can sometimes fail or bail out and return a broad type like Any: so you wouldn’t be able to rely on T being concrete, which makes it mostly useless.

4 Likes

Oh, right. It would have all the same problems Nullable had. Thank you for the answer.