Multiple dispatch with and without missing to same function?

I have a function which takes data loaded from a file. Normally it’s all Matrix{Float64} but sometimes there are some missing values, Matrix{Union{Missing, Float64}. How would I write a function to take a Number matrix with or without missing values?

Here is my failing attempt.

function missingOrNot(data::Array{T}) where T <: Union{Missing, Number}
	if isa(data,Array{Union{Missing, Number}})
		println("Numeric Matrix with missing values")
	elseif isa(data,Array{Number})
		println("Numeric matrix")
	end
end
julia> y = [NaN 2 3 4;5 6 NaN 8;9 10 11 12]
3×4 Matrix{Float64}:
 NaN     2.0    3.0   4.0
   5.0   6.0  NaN     8.0
   9.0  10.0   11.0  12.0

julia> z = [NaN 2.0 3.0 4.0;5.0 6.0 missing 8.0;9.0 10.0 11.0 12.0]
3×4 Matrix{Union{Missing, Float64}}:
 NaN     2.0   3.0        4.0
   5.0   6.0    missing   8.0
   9.0  10.0  11.0       12.0

julia> missingOrNot(y)

julia> missingOrNot(z)

I think:

function missingOrNot(data::Union{Matrix{Float64}, Matrix{Union{Float64, Missing}}})

or

function missingOrNot(data::Matrix{T}) where T Union{Float64, Union{Float64, Missing}}

It’s probably worth taking a read through this section of the manual
https://docs.julialang.org/en/v1/manual/types/#man-parametric-composite-types

The important warning there is that Array{Float64} is not a subtype of Array{Number}. But, it is a subtype of Array{T} where {T <: Number}

The docs were my first destination and that page specifically but must have missed the info.

The important warning there is that Array{Float64} is not a subtype of Array{Number} . But, it is a subtype of Array{T} where {T <: Number}

Just out of curiosity, is there a way to do that within the function? How would I fix the isa functions in my example?
if isa(data,Array{Union{Missing, Number}})

@pixel27 This is how I ended up changing the function definition in my example:
function missingOrNot(data::Matrix{T}) where T Union{S, Union{S, Missing}} where S <: Number

You can use where clauses outside of function definitions, so

isa(data, Array{T} where {T <: Union{Missing, Number}})
# or more concisely
isa(data, Array{<:Union{Missing, Number}})
2 Likes

Are you sure that the main function signature is the place to do this? Couldn’t your function just accept AbstractArray?

I tried both styles but both y and z pass it. isa(y,Array{<: Number}) is more selective. It’s still workable but confusing.

julia> isa(y,Array{<: Union{Missing, Number}})
true

julia> isa(z,Array{<: Union{Missing, Number}})
true

julia> isa(z,Array{<: Number})
false

julia> isa(y,Array{<: Number})
true

This version of my example works.


function missingOrNot(data::Matrix{T}) where T Union{S, Union{S, Missing}} where S <: Number
	if isa(data,Array{<:Number})
		println("Numeric matrix")
	elseif isa(data,Array{T} where {T <: Union{Missing, Number}})
		println("Numeric Matrix with missing values")
	end
end

Edit: Forgot to show some work

Probably, but I think it’s more broad than I want. My goal was to keep it just as specific as necessary without being overly specific. I use filter and !isnan to strip missing and NaN from my data sets.

This is also a great opportunity to refine my understanding of the topic.

Maybe I misunderstood the question. Is this useful:

function missingOrNot(data::AbstractArray{T}) where T <: Union{Missing, Number}
    if Missing <: T
        if any(ismissing, data)
            println("Numeric Matrix with missing values")
        else
            println("Numeric Matrix that could hold missing values, but doesn't.")
        end
	else
		println("Numeric matrix")
	end
end
1 Like

You can use skipmissing to remove missings.

That is a more nicely written alternate to an idea I had. I used multiple dispatch because I figured there would be less overhead, any presumably has to search each value and I am working with a week of data at 10Hz. In my case, if my memory serves me, I used CSV.File to load data. If there are no missing values it returns a Matrix{Float64} otherwise it returns a Matrix{Union{Missing,Float64}} so it’s relatively simple to sort that way if I could get the dispatch correct.

I really appreciate the tip everyone is posting here though.

You only need the any clause if there could be Missing arrays without any actual missing values (that’s possible in principle).

if Missing <: T will be as efficient as using dispatch, because the compiler removes the unused branch.

1 Like

But you might as well not check for this at all and just use skipmissing(data). That will work for both your scenarios.

Now that is something I hadn’t realized, skipmissing doesn’t return an array without missing values but an iterator. It’s already in use in my code but together with filter. That’s very handy.

missing values just got a lot less annoying.