To recover the performance of filter
use the cols => fun
version and don’t broadcast your conditions, as filter
operates row-wise already:
julia> using DataFrames, Chairmarks
julia> df = DataFrame(a = rand(1:10, 1_000_000); b = rand(Bool, 1_000_000));
julia> @b df[df.a .== 2 .&& df.b, :]
507.200 μs (30 allocs: 956.469 KiB)
julia> @b filter(r -> r.a == 2 && r.b, $df)
58.145 ms (2199647 allocs: 34.498 MiB)
julia> @b subset($df, :a => ByRow(==(2)), :b)
599.200 μs (192 allocs: 1.894 MiB)
julia> @b filter([:a, :b] => ((a, b) -> a == 2 && b), $df)
507.400 μs (27 allocs: 956.453 KiB)
julia> function foo1(data)
cond(x, y) = x .== 2 .&& y
df[cond(data.a, data.b), :]
end
foo1 (generic function with 1 method)
julia> @b foo1($df)
504.900 μs (28 allocs: 956.406 KiB)