I just compared subset (using a :col => ByRow(==(0)) operation) vs filter (using a :col => ==(0) operation) on a DataFrame. In both cases, I set view=true and I see significantly less allocations when using filter vs when using subset. Would you agree that in these cases, filter is more performant than the subset operation?
A more typical benchmark using @btime seems to support this result too (with logical indexing leading to slightly better performance than filter despite having more allocations):
In short: because filter accepts only one condition + it works rowwise, so it has much simpler logic internally. subset allows passing multiple conditions + it works on whole columns. (if you want more details it is best to check the source code to see the differences in implementation)
Thanks for pointing this out @bkamins. In general, could we say that if we are doing row-wise filtering, even on multiple conditions (joined with && operators), we should expect filter to outperform subset?