Ignoring NaNs when calculating means of columns of a dataframe

PeX · February 16, 2024, 11:43pm

Hi all,
I have a dataframe with rows as dates and columns as data points. I need to calculate the mean of each column, but the problem is that some of the columns have “NaN” values.
Is there an efficient way to achieve this by ignoring the NaN values for each calculation? (meaning if one column has 4 values but one is NaN, then the mean will only include 3 values in the calculation).
It sounds like a simple problem but I couldn’t find a solution yet.

Thank you!

Dan · February 17, 2024, 12:07am

There is a package SkipNan which might help. Look at the following example:

julia> using SkipNan

julia> skipnan([1.0, NaN, 2.0])
skipnan([1.0, NaN, 2.0])

julia> mean(skipnan([1.0, NaN, 2.0]))
1.5

Topic		Replies	Views
Iterate over all numeric columns in DataFrames Data	21	4849	February 11, 2018
Replacing Nan values with the average column values New to Julia dataframes	4	1102	September 28, 2022
Ignoring NaN in elementwise aggregations General Usage question	7	6263	December 20, 2019
How can I skip missing values of a DF without deleating them? New to Julia dataframes	4	694	November 11, 2021
Dealing with NaN's General Usage dataframes	21	5518	April 27, 2021

Ignoring NaNs when calculating means of columns of a dataframe

Related topics