Initiate DataFrame variable with missing values and type Union{Missing, Float64}

klwlevy · May 15, 2021, 7:49pm

How do I best initiate a new variable to an existing DataFrame, if I want the new variable to contain only missing values? Later I want to assign float values to the new variable.

# Initiate a simple dataframe with one variable (:a) and two rows
df = DataFrame(a=[1, 2])

# Declare a new variable (:weight) and initialize it to missing (Type is Missing, but want Union{Missing, Float64})
df[:, :weight] .= missing

# I can change a single value for variable :a
df[1, :a] = 3

# I cannot change a missing value to a float value, i.e. I cannot change a single value of :weight
#df[1, :weight] = 1.0

# Verbose solution - declare new dataframe with a single column (:weight2) of the correct type
single_col_df = DataFrame(weight2 = Union{Missing, Float64}[missing for i in eachrow(df)])

# Concatenate the two dataframes horizontally
df = hcat(df, single_col_df)

# Now I can change a missing value to a float value
df[1, :weight2] = 1.0

pdeffebach · May 15, 2021, 8:44pm

There isn’t a way to do this super elegantly. I would do the following

df[:, :weight] = Union{Missing, Float64}[missing for i in 1:nrow(df)]

klwlevy · May 15, 2021, 9:31pm

Thanks a lot for the quick reply.

Yes, it would be nice if one could do it with broadcasting.

pdeffebach · May 15, 2021, 9:35pm

yeah would be cool. Probably wouldn’t be too hard to add (someplace, not in DataFrames)

qsong · May 16, 2021, 4:43am

df.weight = similar(df.a, Union{Missing, Float64})

sijo · May 16, 2021, 5:53am

This relies on an implementation detail though, see the note here in the manual. I think the recommended way is:

df.weight = Vector{Union{Missing,Float64}}(missing, nrow(df))

qsong · May 16, 2021, 2:53pm

Thanks, but it’s a little confusing

Using undef or similar … is not the correct way

but it currently give the right result.

sijo · May 16, 2021, 3:18pm

Yes it’s a bit confusing… This was discussed in this PR and the follow-up one. Relying on this behavior is a bit like using reduce with a non-associative operation, or relying on the order of iteration for Dict keys: it might work in one case, but can fail in another:

julia> module A struct B end end;

julia> similar([1,2,3], Union{Missing, A.B})
3-element Vector{Union{Main.A.B, Missing}}:
 Main.A.B()
 Main.A.B()
 Main.A.B()

This reminds me of an anecdote concerning the Go language: the maps (equivalent of Julia Dict) had a non-specified iteration order. But people would sometimes come to rely on it… So they ended up adding randomization to the iteration

Topic		Replies	Views
Assignment of a `missing` value fails in DataFrames 0.11.1 Data	28	4689	June 28, 2018
I have a DataFrame with multiple columns of type Union{Missing, String}. What is the most concise manner of converting the non-missing values in Float? General Usage	2	577	January 29, 2021
Replacing missing values in dataframe-convert-type-union-float64-is-ambiguous General Usage question , dataframes	6	1483	December 21, 2020
How to change the type of a column of a DataFrame General Usage question	9	1378	January 1, 2021
DataFrames. Cannot change missing values New to Julia	1	339	July 2, 2019

Initiate DataFrame variable with missing values and type Union{Missing, Float64}

Related topics