I guess this can go one of two ways. Do you know that :x2 and :x3 are the columns you need to analyze, or are you looking for a solution where you don’t know the columns that are important ex-ante and need a general solution?
If you know it’s :x2 and :x3 than maybe
ismissing.(df.x2) .&& ismissing.(df.x3)
(which works in 1.7)
If you don’t know the columns ex-ante, then the current solution is not bad. I would maybe use map though.
julia> e = [:x2, :x3];
julia> map(eachrow(df)) do r
all(ismissing, r[e])
end
6-element Vector{Bool}:
0
0
1
0
1
0
From the perspective of DataFramesMeta , for the first option you can write
It’s a not problem in this exact instance, but it is less safe to use in general. If you were to forget to write ismissing.(x) and just did x, the code would still run, but the result would be nonsense.
Thanks for your replies, they have helped my understanding, leading me to try this code:
map(r -> all(ismissing, r[e]), eachrow(df))
as suggested by the do loop documentation and the help of @pdeffebach’s code . It takes around 22\mu s in Pluto on my old system as does @pdeffebach’s do loop code, @oheil 's version takes about 92\mu s, which although slower is helpful for seeing the effect of different approaches. My original code takes 60\mu s.
If you really need good performance, instead of eachrow(df) use Tables.namedtupleiterator(df[!, e]), which will be faster. Assuming e isn’t too big, hopefully not bigger than 20.
Of course, there are even faster ways to go about this that we can dig into if you need.
PS
the syntax of the subset function in this case is much less intuitive and direct (it is necessary to slurp the input variables and it is not the first thing that comes to mind)