In following example I want to filter few things specific from both column at same time. I can get end result that I showed below by filtering separately and then I can combine to get end DataFrame but it doesn’t feel right. I’m pretty sure there must be some way to get same result by just one filtering step where I can filter multiple things together.
julia> x = DataFrame(a = repeat([ "s", "p", "t", "q"] , inner = 4, outer = 1), b = rand(1:10, 16))
16×2 DataFrame
Row │ a b
│ String Int64
─────┼───────────────
1 │ s 4
2 │ s 10
3 │ s 4
4 │ s 3
5 │ p 4
6 │ p 7
7 │ p 1
8 │ p 6
9 │ t 9
10 │ t 9
11 │ t 3
12 │ t 4
13 │ q 6
14 │ q 9
15 │ q 5
16 │ q 3
end result I want after filtering.
5×2 DataFrame
Row │ a b
│ String Int64
─────┼───────────────
1 │ s 3
2 │ p 7
3 │ t 9
4 │ t 9
5 │ q 6
@rsubset from DataFramesMeta may be helpful here. I would still write these filters on multiple lines, so they are easier to read though, even if you do technically do it in one step with a begin block.
Can these strategies also work for loop scenario? For instance lets say I have 200 rows from where I need to select based on some condition and it will really crazy to write down each filter criteria individually.
something like this can be helpful.
for i in 1:200
new = filter(row -> row.a == y[i] && row.b == z[i], x )
end
#where y and z contain values of rows for column a and b respectively for filtering.
va = ["s", "p", "t", "q"]
vb = [4,6,9,6]
function myfilter(va,vb, x)
mapreduce(vcat, eachindex(va,vb)) do i
filter(row -> row.a == va[i] && row.b == vb[i], x )
end
end
myfilter(va,vb,x)
I’ve put the mapreduce into a function, otherwise you end up with massive compilation due to the anonymous functions inside filter every time that code block executes.
You can also programmatically create a predicate function that will require only one call to filter.
# this creates an anonymous predicate to be passed into filter
make_filter(x,y) = row->any(s->row.a==s[1] && row.b==s[2], zip(x,y))
flt = make_filter(va, vb)
filter(flt, x)
5×2 DataFrame
Row │ a b
│ String Int64
─────┼───────────────
1 │ s 4
2 │ p 6
3 │ p 6
4 │ t 9
5 │ q 6