In some cases, CSV.read returns a dataframe with vectors that are of type Vector{Union{Missing, String}}.

Is there an easy way to ‘remove’ the missing, i.e. convert to a ‘pure’ datatype?

Also, how can I check how many ‘elements’ the union has?

The code below seems to work. But I was wondering if there is an easier way.

```
x=[2,3]
y=convert(Vector{Union{Missing,Int64}},x)
typeof(y)
"""
tries to convert x::Vector{Union{Missing,T}} to type Vector{T}
"""
function tryToRemoveMissingFromType(x::AbstractVector)
@show elt=eltype(x)
if typeof(elt)!=Union
return nothing
end
a=elt.a
b=elt.b
tragetT = ifelse(a==Missing,b,a)
try
res=convert(Vector{tragetT},x)
return res
catch
@warn("oops")
end
return nothing
end
z=tryToRemoveMissingFromType(y)
@show eltype(z),eltype(x),eltype(y)
```

1 Like

I just noted that the option allowmissing=none of CSV.read (http://juliadata.github.io/CSV.jl/stable/) might solve my issue. Although I am still interested in an answer to my question above.

The `disallowmissing`

function in the `Missings`

package is the usual way of performing this transformation. There are methods in the `DataFrames`

package to apply this to all the columns of a `DataFrame`

. See also `disallowmissing!`

in the `DataFrames`

package.

I’m not sure what you mean by your question about the number of elements that the union has. Perhaps you could rephrase it.

3 Likes

It might be worth asking if it’s really necessary to convert the element types of your `Vector`

s. One of the ideas behind `Missing`

was that this usually shouldn’t be necessary, granted that is more true in 0.7 than it is in 0.6.

Here’s some simple code I used to clean these things up in 0.6:

```
sanitize(::Type{Missing}, v::AbstractVector{Union{T,Missing}}) where T = convert(AbstractVector{T}, v)
function sanitize!(::Type{Missing}, df::AbstractDataFrame)
for i ∈ 1:size(df,2)
if typeof(df[i]) <: AbstractVector{Union{T,Missing}} where {T}
if count(ismissing, df[i]) == 0
df[i] = sanitize(Missing, df[i])
end
end
end
df
end
```

So you can just do `sanitize(Missing, df)`

.

Thank you both. Indeed I possibly should keep the type as it is. But I am using a custom algorithm of mine (and I am really not sure what would/will happen with missing values as I have not read up on it). Therefore It seems safer to get rid of the type (and thus know that I have values for each observation).

Regarding the “number of elements the Union has”.

Well, could I have a type Union{String,Int64,Missing}? If yes, that seems to have 3 “elements” whereas Union{Missing, Int64} only has 2.

It seems the following might have been my answer, but it results 2, why?

```
length(fieldnames(typeof(eltype(convert(Vector{Union{Missing,String,Int64}},x)))))
```

apologies for the nasty one liner.

Break that up into pieces and you will see that the type is fine:

```
julia> using Missings
julia> x = [2,3]
2-element Array{Int64,1}:
2
3
julia> y = convert(Vector{Union{Missing,String,Int64}},x)
2-element Array{Union{Int64, Missings.Missing, String},1}:
2
3
julia> T = eltype(y)
Union{Int64, Missings.Missing, String}
```

but you need to use the appropriate accessor:

```
julia> Base.uniontypes(T)
3-element Array{Any,1}:
Missings.Missing
String
Int64
```

because of implementation details of `Union`

.