How can I replace the missing value in a dataframe by, say, 0?
There is a Missings.replace function but it does not actually make the replacement. It builds a lazy data structure Missings.EachReplaceMissing{DataFrames.DataFrame,Int64}... that most functions working with dataframes (e.g. plots) can’t handle at the moment. Collecting such a structure does not produce a dataframe but gives a MethodError.
Missings.replace expects an iterable, but a DataFrame isn’t one.
I would do something like you did above. Note that DataFrames may not have a homogeneous column type, so replacing missing values in all columns with the same value may not be a common operation.
@Tamas_Papp’s solution has the advantage that the returned DataFrame contains columns which do not allow for missing values, which will be faster. On the contrary, only replacing missing values with 0 won’t change the type of columns, though you could use disallowmissing to do that manually.
An alternative approach is to use coalesce (which is included in Base in Julia 0.7, but only in Missings in Julia 0.6):
for col in names(df)
df[col] = Missings.coalesce.(df[col], 0)
end
Do you know of simpler ways of doing this in other software?
But I hate it when people use it. If I’m reading a script I would rather see
replace variable = 0 if missing(variable)`
because it mirrors other replacements.
Maybe there are performance benefits to using mvencode but I’ve never noticed anything. With that in mind, @harven’s answer above is nice because it keeps the same syntax as other replacements.
OK. The difference we have compared to R is that we support arrays which do not accept missing values, so there are more possible solutions than in R depending on the use case. Also I don’t think we want to allow df[ismissing.(df)] = 0, because data frames are not matrices. Anyway it’s a terribly inefficient approach since it forces you to allocate a matrix of the size of the dataframe.
mvencode appears to work variable by variable IIUC, so that’s more or less similar to what we have.