It often comes up that one wants to filter/select rows of a Matrix based on the values of some column. I believe the best way of doing this is with Boolean masks? But these can get a bit weird… For example:
a = ["a" "x"; "b" "y"] a[in.(a[:, 1], [["a", "c"]]), :]
A bit hard to read and it is probably confusing at first that you need to wrap the iterable in another iterable.
In this case I would prefer something like
filter!(). For example with
DataFrames I can do it like this (although it is currently not at all efficient; but it could be):
using DataFrames b = DataFrame(["a" "x"; "b" "y"]) filter!(row -> row in ["a", "c"], b)
Which got me thinking that perhaps it makes sense to add another version of
filter!() that also has a dimension argument (
1 for iterating rows as in
DataFrame)? Or am I missing something and it is already easy to achieve somehow?
And also, why is
filter!() so much slower than using a Boolean mask?
fun1(arr) = filter!(x -> x in ["a", "c"], arr) fun2(arr) = arr[in.(arr, [["a","c"]]), :] c = rand(["a","b","c","d"], 10000000); @time fun1(c); @time fun2(c); @time fun1(c); @time fun2(c);