How to subset dataframe whose first column is smaller than the second column?

I have a dataframe,and I want to subset the dataframe whose first column is smaller than second column.

julia> a=DataFrame(p=[1,2,3,4,5,6],q=[1,2,4,5,2,9])
6×2 DataFrame
 Row │ p      q
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2
   3 │     3      4
   4 │     4      5
   5 │     5      2
   6 │     6      9

I tried like the following

julia> index=a[!,:p].<a[!,:q]
julia> b=hcat(a,index)
6×3 DataFrame
 Row │ p      q      x1
     │ Int64  Int64  Bool
─────┼─────────────────────
   1 │     1      1  false
   2 │     2      2  false
   3 │     3      4   true
   4 │     4      5   true
   5 │     5      2  false
   6 │     6      9   true
julia> subset(b,:x1)
3×3 DataFrame
 Row │ p      q      x1
     │ Int64  Int64  Bool
─────┼────────────────────
   1 │     3      4  true
   2 │     4      5  true
   3 │     6      9  true
julia> select(subset(b,:x1),:p,:q)
3×2 DataFrame
 Row │ p      q
     │ Int64  Int64
─────┼──────────────
   1 │     3      4
   2 │     4      5
   3 │     6      9

Are their any faster or easier ways?

# DataFrames.jl only
using DataFrames
subset(a, [:p, :q] => ByRow(<))

# With DataFrameMacros.jl convenience package
using DataFrameMacros
@subset(a, :p < :q)

In this case you can do ByRow(<) so the macro solution is not much shorter, but generally for more complex things you’ll have to define an anonymous function for the normal DataFrames syntax, which is when the macro version will be quite a bit shorter.

3 Likes