Hello there!

How do i proceed from here (describe(df, :nmissing)) to get the % of missingness per column in my df dataframe?

Hi.

You may try, e.g. something like

```
import DataFrames as Dfs
import Statistics as Stats
df = Dfs.DataFrame(
:col1 => rand([missing, 1:6...], 10),
:col2 => rand([missing, 1:6...], 10)
)
map(ismissing, df[!, "col1"]) |> xs -> Stats.mean(xs) * 100
```

Short explanation:

`map`

executes `ismissing`

function on `col1`

and returns a vector of Bools, which is then sent to anonymous function that names the vector `xs`

and calculates its mean (`Stats.mean(xs)`

, true is treated as 1, false as 0) which is multiplied by 100 to get the result expressed as percentage

For all columns you could go with:

```
for c in Dfs.names(df)
println("%missing in $c = ",
map(ismissing, df[!, c]) |> xs -> Stats.mean(xs) * 100)
end
```

`describe(df)`

produces another DataFrame, so you can just add a column to that:

```
julia> df = DataFrame(rand([missing; 1:6], 10_000, 3), :auto);
julia> x = describe(df, :nmissing);
julia> x.perc_missing = 100 .* x.nmissing ./ nrow(df); x
3×3 DataFrame
Row │ variable nmissing perc_missing
│ Symbol Int64 Float64
─────┼──────────────────────────────────
1 │ x1 1408 14.08
2 │ x2 1432 14.32
3 │ x3 1455 14.55
```

1 Like

Thank you!