I have a DataFrame “results” that is sorted by _.date. I’m experimenting to see the difference between using a filter (A):
results |> @filter(_.date >= Dates.Date(2018,1,1) && _.date <= Dates.Date(2019,1,1))
vs direct indexing (B):
let idx_first = findfirst(dt -> dt == Dates.Date(2018,1,1), results.date)
idx_last = findlast(dt -> dt == Dates.Date(2019,1,1), results.date)
@view results[idx_first:idx_last, :]
end
When running @time I get (I’ve ran @time several times in a row, so no JIT involved):
(A): 0.013382 seconds (5.74 k allocations: 336.941 KiB)
(B): 172.499 μs (12 allocations: 416 bytes)
Clearly (B) wins. But when I run @btime in front of the two code samples I get:
(A): 60.500 μs (121 allocations: 8.31 KiB)
(B): 0.040810 seconds (104.54 k allocations: 6.316 MiB)
and (B) does a lot of allocations, thus (A) wins. Can someone explain this to me?
When I collect both into dataframes at the end
results |> @filter(_.date >= Dates.Date(2018,1,1) && _.date <= Dates.Date(2019,1,1)) |> DataFrame
vs
let idx_first = findfirst(dt -> dt == Dates.Date(2018,1,1), results.date)
idx_last = findlast(dt -> dt == Dates.Date(2019,1,1), results.date)
results[idx_first:idx_last, :] |> DataFrame
end
then (B) is the clear winner in both (@time vs @btime) cases.