Replace NULLs in DataTable

abieler · March 27, 2017, 5:18pm

Alternatives to doing

df[df[:A].isnull, :A] = newValue

for replacing NULLs in column :A of the DataTable df with “newValue”? From an earlier question I think that using .values and .isnull is not encouraged.
Maybe an “impute()” function would be useful?

Thanks
Andre

nalimilan · March 28, 2017, 11:52am

You can write df[isnull.(df[:A]), :A] = newValue to avoid accessing private fields.

But since there won’t be any missing values left in the column, you can also convert it to a standard Array. The conversion method accepts a second argument giving the value to replace nulls with: df[:A] = convert(Array, df[:A], newValue).

We could provide a more discoverable and shorter function for that. dplyr uses coalesce (inspired from SQL), which can be passed either arrays or scalars. In the present case, passing a vector and a scalar would replace the nulls with the scalar.

Topic		Replies	Views
Replacing missing, really Data	8	10759	February 26, 2020
Replacing values in a data frame Data	7	2654	June 7, 2020
Data Frames for non null data Data	4	1332	February 23, 2018
Nullables - why? and how? New to Julia	6	2439	December 19, 2017
DataFrame can't get filter to get rid of rows where a column contains a blank " " New to Julia dataframes	8	1050	February 19, 2022

Replace NULLs in DataTable

Related topics