Initiate DataFrame variable with missing values and type Union{Missing, Float64}

How do I best initiate a new variable to an existing DataFrame, if I want the new variable to contain only missing values? Later I want to assign float values to the new variable.

# Initiate a simple dataframe with one variable (:a) and two rows
df = DataFrame(a=[1, 2])

# Declare a new variable (:weight) and initialize it to missing (Type is Missing, but want Union{Missing, Float64})
df[:, :weight] .= missing

# I can change a single value for variable :a
df[1, :a] = 3

# I cannot change a missing value to a float value, i.e. I cannot change a single value of :weight
#df[1, :weight] = 1.0

# Verbose solution - declare new dataframe with a single column (:weight2) of the correct type
single_col_df = DataFrame(weight2 = Union{Missing, Float64}[missing for i in eachrow(df)])

# Concatenate the two dataframes horizontally
df = hcat(df, single_col_df)

# Now I can change a missing value to a float value
df[1, :weight2] = 1.0

There isn’t a way to do this super elegantly. I would do the following

df[:, :weight] = Union{Missing, Float64}[missing for i in 1:nrow(df)]
2 Likes

Thanks a lot for the quick reply.

Yes, it would be nice if one could do it with broadcasting.

yeah would be cool. Probably wouldn’t be too hard to add (someplace, not in DataFrames)

df.weight = similar(df.a, Union{Missing, Float64})

This relies on an implementation detail though, see the note here in the manual. I think the recommended way is:

df.weight = Vector{Union{Missing,Float64}}(missing, nrow(df))
1 Like

Thanks, but it’s a little confusing

Using undef or similar … is not the correct way

but it currently give the right result.

Yes it’s a bit confusing… This was discussed in this PR and the follow-up one. Relying on this behavior is a bit like using reduce with a non-associative operation, or relying on the order of iteration for Dict keys: it might work in one case, but can fail in another:

julia> module A struct B end end;

julia> similar([1,2,3], Union{Missing, A.B})
3-element Vector{Union{Main.A.B, Missing}}:
 Main.A.B()
 Main.A.B()
 Main.A.B()

This reminds me of an anecdote concerning the Go language: the maps (equivalent of Julia Dict) had a non-specified iteration order. But people would sometimes come to rely on it… So they ended up adding randomization to the iteration :slight_smile:

1 Like