Unfortunately that solution doesn’t fly when there are many columns with heterogeneous types:
julia> df = DataFrame(rand(10000, 100));
julia> df.a = 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> @time _filter(x -> x.x1 > 0.5, df);
0.809720 seconds (3.87 M allocations: 132.545 MiB, 12.98% gc time)
julia> @time filter(x -> x.x1 > 0.5, df);
0.105625 seconds (2.52 M allocations: 67.293 MiB, 10.89% gc time)
My current thinking is that the ideal interface would be something like filter(x1 -> x1 > 0.5, df)
, and we would extract the names of the arguments to identify which variables (here x1
) are actually used. That would avoid problems with too large numbers of columns and would offer a compact syntax.