DataFrames: `@combine` only if a condition is met

How can I compute a variable with @combine only if some condition is met?

MWE:

using DataFrames, DataFramesMeta
dfTmp = DataFrame(y = 1:3, x = 10:12)
@combine dfTmp begin
    # Always compute `a`
    :a = :y * 2;
    # Only compute `b` if `x` is present
    :b = (:x in names(dfTmp))  ? (:x .+ 1)  :  zeros(nrow(dfTmp));
end
# Now remove `b` if `x` was not present

This is slightly clumsy and it fails if x is not in names(dfTmp).

The alternative is to implement a 2nd combine statement just for b and to join the resulting DataFrames. But that seems clumsy and inefficient.

Are there better options?

If-else too simple?

using TidierData
dftmp = DataFrame(y = 1:3, x = 10:12)
if "x" in names(dftmp)                                                                  
    @transmute(dftmp, a = y * 2, b = x + 1)
else
    @transmute(dftmp, a = y * 2)
end
1 Like

I am hoping to avoid duplicating all the code that computes variables in all cases (the a = y * 2 in this case).
This is especially important if there are multiple variables that may be missing from the DataFrame.
But thanks for the suggestion.

2 Likes

Here’s a more concise way:

using TidierData
df = DataFrame(y = 1:3, x = 10:12)
@chain df begin
  @mutate(a = y * 2)
  if "x" in names(_) @mutate(_, b = x + 1) else _ end
  # more conditions here...
end
3Γ—4 DataFrame
 Row β”‚ y      x      a      b     
     β”‚ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 β”‚     1     10      2     11
   2 β”‚     2     11      4     12
   3 β”‚     3     12      6     13

Just in case it’s not clear, the _ is a placeholder within @chain. If the condition is not met, having the else _ ensures that you return the data frame without modifications so you can continue the chain. @mutate uses DataFrames.transform() under the hood.

1 Like

That works (even if x is not present, while DataFramesMeta does not work in that case).
I will mark it as the solution, even though I would prefer a solution that does not require me to write that one section of the code in the Tidier syntax, while I’m using DataFramesMeta everywhere else.
Thank you for the suggestion.

You can also use AsTable:

julia> using DataFrames, DataFramesMeta

julia> dfTmp = DataFrame(y = 1:3, x = 10:12)
3Γ—2 DataFrame
 Row β”‚ y      x
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     1     10
   2 β”‚     2     11
   3 β”‚     3     12

julia> @combine dfTmp $AsTable = begin
           a = :y * 2
           if "x" in names(dfTmp)
               b = :x .+ 1
               (; a, b)
           else
               (; a)
           end
       end
3Γ—2 DataFrame
 Row β”‚ a      b
     β”‚ Int64  Int64
─────┼──────────────
   1 β”‚     2     11
   2 β”‚     4     12
   3 β”‚     6     13

Edit: no this fails if x doesn’t exist because the created function is given to DataFrames with args :y and :x

Happy to provide a DataFramesMeta solution. Should be similar. Will take a look.

Does this work? (not at a computer)

using DataFramesMeta, Chain
df = DataFrame(y = 1:3, x = 10:12)
@chain df begin
  @rtransform(:a = :y * 2)
  if "x" in names(_) @rtransform(_, :b = :x + 1) else _ end
  # more conditions here...
end

That works, even if the conditioning variable does not exist - thanks.

1 Like

or by exploiting one of Julia’s peculiarities

df = DataFrame(y = 1:3, x = 10:12)
f(dfxy::@NamedTuple{y::Int64, x::Int64})=(;a=dfxy.y*2,b=dfxy.x+1)
f(dfy::@NamedTuple{y::Int64})=(;a=dfy.y*2)

transform(df, AsTable(:) => ByRow(f)=>AsTable )

dfnx=df[:,Not(:x)] 
transform(dfnx, AsTable(:) => ByRow(f)=>AsTable )
1 Like

That’s an interesting option. Instead of writing different combinations of f, one could just put if else logic into f (otherwise, writing out several versions of f could be tedious). I would have to check performance, though. Thanks.

1 Like