DataFrames: removing columns with many missing values

How can I remove columns with more than 10% missing data from a DataFrame? In R it would be x[, colMeans(is.na(x)) < 0.1]. Thanks!

You can find the names of columns that satisfy this with, e.g.

filter(c -> count(ismissing, df[:,c])/size(df,1) > 0.1, names(df))
1 Like

The direct equivalent of the R syntax would be x[:, mean.(ismissing, eachcol(x)) .< 0.1].

3 Likes