DataFrames: removing columns with many missing values

johnbb · October 14, 2019, 3:28pm

How can I remove columns with more than 10% missing data from a DataFrame? In R it would be x[, colMeans(is.na(x)) < 0.1]. Thanks!

ExpandingMan · October 14, 2019, 3:33pm

You can find the names of columns that satisfy this with, e.g.

filter(c -> count(ismissing, df[:,c])/size(df,1) > 0.1, names(df))

nalimilan · October 14, 2019, 7:33pm

The direct equivalent of the R syntax would be x[:, mean.(ismissing, eachcol(x)) .< 0.1].

Topic		Replies	Views
Quick DataFrame question: I am trying to read CSVs that, for some reason, have pos General Usage	1	247	February 16, 2021
Finding DataFrame rows with `missing` values in specific columns? General Usage dataframes , missing-values	12	1085	February 7, 2022
% of missingness per column New to Julia	3	77	October 4, 2024
How to remove rows containing missing from DataFrame? New to Julia	6	13214	July 22, 2019
How do I drop only rows that are fully filled with missing values? General Usage question , package , dataframes	3	183	January 24, 2023