Dear Julia Data community,
I would like to coalesce
entries across columns of a DataFrame and record where the value is coming from. I came up with the following function, which seems to work but feels a bit cumbersome and indirect.
using DataFrames, DataFrameMacros
function coalesce_dfcols!(df::DataFrame, vars::Vector{<:AbstractString}; outname=join(vars, "_") * "_coal")
@transform! df outname = coalesce({{vars}}...)
#transform!(df, AsTable([:x,:y,:z]) => ByRow(x->coalesce(x...)) => x->join(x,"_")*"_coal" ) #Dataframe style
@transform! df "source_of_" * outname = ismissing({outname}) ? missing : vars[findfirst(isequal.({{vars}}, {outname}))]
#transform(ttdf, (AsTable([:x,:y,:z]), :x_y_z_coal) => ((x,y) -> ismissing(y) ? missing : 1) => (x,y)->"source_of_"*y) ##does not work
end
Is there a simpler way (e.g. can one access the names of the in the transform function of the minilanguage?) [And I did not manage the second transform in original Dataframe minilanguage]
Below a MWE:
julia> ttdf = DataFrame(id=1:5, x=[1,2,missing, missing, missing], y=[missing, 4,missing,6, missing], z=[5,8,3,2, missing])
5×4 DataFrame
Row │ id x y z
│ Int64 Int64? Int64? Int64?
─────┼──────────────────────────────────
1 │ 1 1 missing 5
2 │ 2 2 4 8
3 │ 3 missing missing 3
4 │ 4 missing 6 2
5 │ 5 missing missing missing
julia> coalesce_dfcols!(ttdf, ["x", "y", "z"])
5×6 DataFrame
Row │ id x y z x_y_z_coal source_of_x_y_z_coal
│ Int64 Int64? Int64? Int64? Int64? String?
─────┼────────────────────────────────────────────────────────────────────
1 │ 1 1 missing 5 1 x
2 │ 2 2 4 8 2 x
3 │ 3 missing missing 3 3 z
4 │ 4 missing 6 2 6 y
5 │ 5 missing missing missing missing missing