Replace missing with 0.0 in dataframe

I am trying to replace missing values in Float coluns with 0.0.

My starting point:

for col in names(df)
    df[ismissing.(df[col]), col] = 0.0
end

While this works, it has two weak points:
a. it doesn’t check the type of the column; only in Float colums
missing shall be replaced with 0.0
b. it throws a warning

Any idea how to fix these two issues?

The warnings can easily be avoided:

julia> for col in names(df)
           df[ismissing.(df[:,col]), col] .= 0.0
       end

and now I am thinking about the type checking.
How to check for a

julia> typeof(df.y)
Array{Union{Missing, Float64},1}

By the way, for MWE:

julia> df=DataFrame(x=[1,2,3],y=[1.0,missing,2.0])

ok, I think, it lacks elegance:

julia> df=DataFrame(x=[1,missing,3],y=[1.0,missing,2.0])
3Γ—2 DataFrame
β”‚ Row β”‚ x       β”‚ y        β”‚
β”‚     β”‚ Int64   β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1       β”‚ 1.0      β”‚
β”‚ 2   β”‚ missing β”‚ missing  β”‚
β”‚ 3   β”‚ 3       β”‚ 2.0      β”‚

julia> for col in names(df)
           if typeof(df[:,col])==Array{Union{Missing, Float64},1}
               df[:,col]= [ ismissing(x) ? 0.0 : x for x in df[:,col] ]
           end
       end

julia> df
3Γ—2 DataFrame
β”‚ Row β”‚ x       β”‚ y        β”‚
β”‚     β”‚ Int64   β”‚ Float64  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1       β”‚ 1.0      β”‚
β”‚ 2   β”‚ missing β”‚ 0.0      β”‚
β”‚ 3   β”‚ 3       β”‚ 2.0      β”‚

Check out the documentation for this in the DataFrames documentation.

It looks like the answer you want is coalesce.(df, 0.0)

3 Likes

Thanks for your suggestions!
But I still get a warning:

β”Œ Warning: `setindex!(df::DataFrame, v::AbstractVector, ::Colon, col_ind::ColumnIndex)` is deprecated, use `begin
β”‚     df[!, col_ind] = v
β”‚     df
β”‚ end` instead.

just use df[!, col] = [ismissing(x) ? 0.0 .... Alternatively if you upgrade to the most recent version of DataFrames (released today), you that will work without a warning message.

1 Like

The comments above are better as exact solutions to OP’s question, but I thought some people may find it interesting to know how to do this with MLJ:

using MLJ, DataFrames
df = DataFrame((x=[randn(5)..., missing], y=[randn(5)...,missing]))

filler = machine(FillImputer(continuous_fill=m->0), df)
fit!(filler)
dff = transform(filler, df)

The fill imputer can use other filling rules like the mean of the column or its median. The keyword continuous_fill takes a function to apply to β€œcontinuous” features as opposed to count_fill / finite_fill which take functions for count-data / categorical data.

6 Likes

How about this one:

mapcols(c->eltype(c)<:Union{Float64, Missing} ? coalesce.(c, 0.0) : c, df)

1 Like

Is there a way to get all types which are part of Union?
E.g.:

allTypes( Union{Float64, Missing} )
Float64
Missing

Or something similar, like e.g.

isInUnion(Float64,Union{Float64, Missing})
true

A bit outside the scope of the question but Float64 <: Union{Float64, Missing} does the latter. See also this post on retrieving the non-missing type and corresponding pointer to Missings.jl.

1 Like

For the first part, Base.uniontypes should do exactly what you want.

1 Like