Row wise median (or sum or mean) with missings

Hi there,

I’ve been beating my head on this for a bit, and I’m coming up empty. Based on another answer I found here (and after realizing there’s no R equivalent of the apply functions or dplyr rowwise functions), I tried something like this:

f(a, b, c, d, e, f) = Statistics.median(a, b, c, d, e, f)

And then tried to execute it like:
f.(reddit_df[:subdomain_all_freq_comment_count], reddit_df[:subdomain_all_freq_comment_query_count], reddit_df[:subdomain_all_freq_submission_count], reddit_df[:subdomain_all_freq_submission_query_count], reddit_df[:subdomain_all_freq_active_user_count], reddit_df[:subdomain_all_freq_subscriber_count])

Which yields “ERROR: MethodError: no method matching median(::Float64, ::Float64, ::Float64, ::Float64, ::Missing, ::Missing)”

I assume it’s due to the presence of the missings. But when I try to add a skipmissings in there, pretty much anywhere, it doesn’t work either. Any tips here on how to do row-wise operations over a dataframe?

Thank you!

Ah, for any who wander across this. I figured it out (finally!).

f(a, b, c, d, e, f) = Statistics.median(skipmissing([a, b, c, d, e, f]))

f.(reddit_df[:subdomain_all_freq_comment_count], reddit_df[:subdomain_all_freq_comment_query_count], reddit_df[:subdomain_all_freq_submission_count], reddit_df[:subdomain_all_freq_submission_query_count], reddit_df[:subdomain_all_freq_active_user_count], reddit_df[:subdomain_all_freq_subscriber_count])

Please quote your code (you can still edit your posts).

using DataFrames, Statistics

data = DataFrame(a = 1:5,
                 b = 2:6) |>
       allowmissing!
data.a[3] = missing
data.b[5] = missing
sum.(skipmissing.(eachrow(data)))
mean.(skipmissing.(eachrow(data)))
median.(skipmissing.(eachrow(data)))
1 Like