Adding two Array{Union{Missing,Float64}} returns Array{Float64}, is this by design?

y = Array{Union{Missing,Float64}}(undef,100);
y[:] = rand(100)
ϵ = Array{Union{Missing,Float64}}(undef,100);
ϵ[:] = rand(100)

julia> eltype(y.+ϵ)
Float64

I am not sure but I imagine the idea behind doing this is returning the type that simplifies the array and will perform best after an operation.

Whatever the reason, I wonder what would be the best way to keep the type Union{Missing,T} when operating with Arrays regardless whether the Array contains missing values or not.

2 Likes

Isn’t that what happens if the assignments are made using dots?

y .= rand(100)
1 Like

Oh! that’s nice!

The types do not seem to be preserved after the sum (Julia 1.6), though.

Yeah, that’s a bit of a problem, I was wondering what was the Julia way to preserve the type with missing values.

Not sure if this is recommended:
typeof(y)(y+ϵ)
or:
Union{typeof(y),typeof(ϵ)}(y+ϵ)

In the package Missing.jl they use convert but I was wondering if there was a better way

It’s because your arrays don’t actually contain any missing values. If you do y[end] = missing, the return type is Vector{Union{Missing, Float64}}:

julia> y[end] = missing
missing

julia> typeof(y .+ ϵ)
Vector{Union{Missing, Float64}}

I haven’t checked what happens in a function, but I suspect this is a global-scope only thing.

allowmissing in Missings.jl would be the best way.

Seems to exist in functions too.

I think this makes sense from the design of broadcasting. But it is annoying.

Yeah, I know, but problem is that if I do z = y .+ ϵ then z[end] = missing throws an error, which is what I want to avoid.

This is what allowmissing does in Missing.jl:

allowmissing(x::AbstractArray{T}) where {T} = convert(AbstractArray{Union{T, Missing}}, x)

I could convert directly instead loading a package but I was wondering if this is supposed to be the default behavior, I am asking because for certain algorithms I will have to constantly convert for every single operation I do with Arrays capable of containing missing values.

1 Like

The right-hand side allocates a new vector, as such, its type may need shrinking or widening. In this case, the algorithm decided shrinking is useful (makes sense, because you didn’t have any missing to begin with)

I would say in analytics that’s debatable; precisely the whole point of defining Array{Union{Missing,T}} is because I am planing to insert missing values but I dot not want any type conversion every time my Arrays happen not to have any in an operation.

The workaround I am considering is using NaN isntead missing since NaN isa Number is true and I would not run into this kind of problems.

then you should be doing

y .+= ϵ

instead? and keep using y since that’s your “pre-allocated vector”

2 Likes

That’s nice, as long as I don’t need y for anything else…

z = copy(y)
z .= y + ϵ

1 Like

well, you can also z = (copy(y) .+= ϵ)

1 Like

This one perhaps could be faster with

z = similar(y)
z .= y .+ ϵ
4 Likes

Missing values are so important that I wonder if we could ask for them to be a subtype of Number.

definitely not

3 Likes

Very useful explanation