Replace NULLs in DataTable

question

#1

Alternatives to doing

df[df[:A].isnull, :A] = newValue

for replacing NULLs in column :A of the DataTable df with “newValue”? From an earlier question I think that using .values and .isnull is not encouraged.
Maybe an “impute()” function would be useful?

Thanks
Andre


#2

You can write df[isnull.(df[:A]), :A] = newValue to avoid accessing private fields.

But since there won’t be any missing values left in the column, you can also convert it to a standard Array. The conversion method accepts a second argument giving the value to replace nulls with: df[:A] = convert(Array, df[:A], newValue).

We could provide a more discoverable and shorter function for that. dplyr uses coalesce (inspired from SQL), which can be passed either arrays or scalars. In the present case, passing a vector and a scalar would replace the nulls with the scalar.