Inplace mutation of DataFrame column of type Missing

I am updating a CSV file by loading it as a DataFrame, loading it in memory and replacing values where I need. The minimal version of this dataset looks like this:

df = CSV.File(IOBuffer("""fruit,count,verifier\napple,2,\norange,1,\npear,3,"""))|> DataFrame
3×3 DataFrame
 Row │ fruit    count  verifier
     │ String7  Int64  Missing
─────┼──────────────────────────
   1 │ apple        2   missing
   2 │ orange       1   missing
   3 │ pear         3   missing

In this example, elements in the :count column will be fairly easy to modify:

df[findfirst(==("apple"), df.fruit), :count] += 1

But if I want to modify the value at :verifier column:

df[findfirst(==("apple"), df.fruit), :verifier] = "joe"

It does not let me because I cannot assign a String value to a column of type Missing.

I tried to change the type of the column to Vector{Union{Missing,String}}

 df.verifier .= convert(Vector{Union{Missing,String}}, df.verifier)

But the type of the column persists as Missing.
Got any ideas? I guess I could manually set the types of columns when I first load the DataFrame, but I have dozens of columns and do not want to manually set all column types. I think what I am looking after is something like

function setproperty!(d::DataFrameRow,:c, val; promote=true)

But cannot find such function.

df.verifier = missings(String, nrow(df))

4 Likes

Awesome, didn’t know of missings. Thank you @bkamins!

1 Like

you could add and remove a dummy row , similarly to the following

push!(df,("",-1,""),promote=true)
pop!(df)
1 Like