Difference between `Union{Missing, Float64}` and `Float64?`

I have a DataFrame that was created by importing a CSV. Then I applied several transformations. Now when I run describe on it I get some columns that report Union{Missing, Float64} as type and others report Float64?.

What is the difference? Both columns have Float64 type and contain missing values.

you can’t possibly have missing values in the Float64 column. But you certainly can have columns of Union{Missing, Float64} but really no longer have any missings, in which case you can use:
disallowmissing!

1 Like

The column type is Float64? with the question mark at the end.

if you see a T? with a question mark that means Union{Missing, T}. As described in the Getting Started section of docs: Getting Started Β· DataFrames.jl

1 Like

Ok, so they are the same.
It’s confusing that two number columns that contain missing values are marked one as Union{Missing, Float64} and the other column as Float64?

That does sound strange, where is this happening? In the REPL I see:

julia> df = DataFrame(rand([missing; 1.0:10.0], 10, 10), :auto)
10Γ—10 DataFrame
 Row β”‚ x1         x2        x3        x4        x5        x6         x7         x8         x9         x10      
     β”‚ Float64?   Float64?  Float64?  Float64?  Float64?  Float64?   Float64?   Float64?   Float64?   Float64? 
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚       8.0       7.0       1.0       8.0       3.0        3.0        1.0  missing          3.0       9.0

Good question.
I had some time today to investigate and came up with the following conclusion: it’s just a matter of display real estate.
If the column name is short, then there would not be enough space to fit the Union{...} header

julia> DataFrame(a = [1,2,3,missing])
4Γ—1 DataFrame
β”‚ Row β”‚ a       β”‚
β”‚     β”‚ Int64?  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1       β”‚
β”‚ 2   β”‚ 2       β”‚
β”‚ 3   β”‚ 3       β”‚
β”‚ 4   β”‚ missing β”‚

However, if the column name is long enough, then the Union{...} type name representation would fit and so would be used:

julia> DataFrame(a12345678912345678901 = [1,2,3,missing])
4Γ—1 DataFrame
β”‚ Row β”‚ a12345678912345678901 β”‚
β”‚     β”‚ Union{Missing, Int64} β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1                     β”‚
β”‚ 2   β”‚ 2                     β”‚
β”‚ 3   β”‚ 3                     β”‚
β”‚ 4   β”‚ missing               β”‚
5 Likes