From the MWE:
0.019227 seconds (313 allocations: 3.065 MiB)
0.037725 seconds (599.53 k allocations: 15.439 MiB, 8.73% gc time)
Faster and much, much fewer allocations. This result is better than what I see in the actual program where the hand-rolled filter (which doesn’t even pre-allocate for the result) is 4x faster. However my dataset is smaller, so there’s probably some 1-time overhead in the filter function that dominates. I think the MWE is a more realistic result.
What’s really interesting is if you replace my loop with something like
class = df[!,:b] .> 5
The allocations go way up and the run time is much worse:
0.273326 seconds (277.86 k allocations: 17.071 MiB)
0.044197 seconds (599.55 k allocations: 15.438 MiB, 9.76% gc time)
type instability ?
using DataFrames function test() n = 200000 df = DataFrame([[rand(1:10) for _ in 1:n], [rand(1:10) for _ in 1:n], [rand(1:10) for _ in 1:n]], [:a, :b, :c]) class = zeros(Bool, size(df,1)) m = 0 for i=1:size(df,1) class[i] = df[i,:b] > 5 if class[i] m += 1 end end println(m) df1_ = @time df[class,:] df1 = @time filter(r->r[:b] > 5, df) end test()