How do I convert the following code fragment to use `subset`

rather than `filter`

and is there a benefit in doing so for large dataframes?

```
df = DataFrame(a = [1,2,3,4,6,6], b=[1,missing,3,missing,5,missing], c=[missing,2,missing,4,missing,6])
filter([:a, :b] => (x, y) -> ismissing(y) || x > y, df)
```

oheil
May 15, 2021, 4:17pm
#2
I donβt know if you accept this solution:

```
julia> function isgreater(x,y)
if ismissing(x) || ismissing(y)
return false
end
return x>y
end
julia> subset(df, [:a,:b] => (x,y) -> ismissing.(y) .| isgreater.(x,y) )
4Γ3 DataFrame
Row β a b c
β Int64 Int64? Int64?
ββββββΌβββββββββββββββββββββββββ
1 β 2 missing 2
2 β 4 missing 4
3 β 6 5 missing
4 β 6 missing 6
```

I donβt think `subset`

is a benefit, a benchmark shows itβs slower.

oheil
May 15, 2021, 4:23pm
#3
```
julia> @btime filter(:b => x -> ismissing(x), $df)
1.330 ΞΌs (15 allocations: 1.42 KiB)
julia> @btime subset($df, :b => x -> ismissing.(x) )
20.700 ΞΌs (146 allocations: 8.77 KiB)
```

In this case `filter`

is efficient and `subset`

does more pre and post processing work, so it can be expected to be slower (but the slowdown should be roughly a constant - as your data grows the performance should be comparable).

1 Like

qsong
May 16, 2021, 4:53am
#5
Why not just use `ByRow`

?

```
subset(df, [:a, :b] => ByRow((x, y) -> ismissing(y) || x > y))
```

1 Like