I have just read the documentation on missing values and while I can’t comment on performance issues I have found it very intuive. The documentation on missing starts as:
Julia provides support for representing missing values in the statistical sense, that is for situations where no value is available for a variable in an observation, but a valid value theoretically exists.
Propagation on mathematical operations, behavior in equality, comparison and logical operators are a natural extension of this definition and they make intuitive sense. My question is this: The result of the following operation 0 * missing is missing. This also makes sense if the missing object has a theoretical value of any data type. What if we have a missing value yet theoretically we know it should be a real or integer number. Then the result of the above operation should be 0. Is there any way to impose a data type such as Float64 on the missing object? If not is this a valid and intuitive request?
For example a missing(Float64) object should behave as the following in these two operations: missing(Float64) * 0 should yield 0, yet missing(Float64) * 1 should yield missing.
Among other things, this would violate type stability. But that’s no longer that big of a deal,
and there is no single best concept for missingness, the one used in Base is just simple and consistent. So you could define your own type for this, eg
It’s a bit off-topic, but was it every considered to have Missing{T}? Then one could give more satisfying answers to:
julia> occursin(missing, "a")
ERROR: MethodError: no method matching occursin(::Missing, ::String)
Stacktrace:
[1] top-level scope at none:0
julia> zero(missing)
missing
julia> length(missing)
ERROR: MethodError: no method matching length(::Missing)
Closest candidates are:
length(::Core.SimpleVector) at essentials.jl:582
length(::Base.MethodList) at reflection.jl:732
length(::Core.MethodTable) at reflection.jl:806
...
Stacktrace:
[1] top-level scope at none:0
It’s occasionally a pain that f.(v) calls f(::Missing), which loses the type information in v. I get that it might not be worth the extra complexity though…
Yes, it’s been discussed a lot (like probably all issues regarding missing values). The issue it raises has even been dubbed the “counterfactual return type problem” by John Myles White. Basically, in many situations, a function which wants to return a Missing{T} value is not able to find out what T is when the value is missing without relying on type inference. But it’s generally not considered a good practice to rely on inference for user-visible behavior, since it can sometimes fail or bail out and return a broad type like Any: so you wouldn’t be able to rely on T being concrete, which makes it mostly useless.