Julia: how to fill na in data frame like as data frame in python


#1

how to use the Julia to do the following python code in data frame Julia.
(1) filling Na with most popular values
(2) filling Na with mean

MSZoning NA in pred. filling with most popular values

features['MSZoning'] = features['MSZoning'].fillna(features['MSZoning'].mode()[0])

# LotFrontage  NA in all. I suppose NA means 0
features['LotFrontage'] = features['LotFrontage'].fillna(features['LotFrontage'].mean())

#2

You can try this (the example uses mean replacement):

julia> df = DataFrame(x = [1, 2, 9, missing])
4×1 DataFrames.DataFrame
│ Row │ x       │
├─────┼─────────┤
│ 1   │ 1       │
│ 2   │ 2       │
│ 3   │ 9       │
│ 4   │ missing │

julia> recode!(df[:x], missing => mean(skipmissing(df[:x])));

julia> df
4×1 DataFrames.DataFrame
│ Row │ x │
├─────┼───┤
│ 1   │ 1 │
│ 2   │ 2 │
│ 3   │ 9 │
│ 4   │ 4 │

If you do not want to do it in-place use recode instead of recode!.

mode function is available in StatsBase.jl package.


#3

Since you replace missing values, df[:x] = recode(df[:x], missing => mean(skipmissing(df[:x]))) is probably better than recode! since it will create a column which does not allow for missing values, which makes further operations more efficient.


#4

In DataFramesMeta you can use an ifelse function.

df = DataFrame(rand(10, 10))
@transform(df, y = ifelse.(:x1 .> .5, :x1, 100))