Trying to bench two code samples and I'm getting different results @time vs @btime

I have a DataFrame “results” that is sorted by _.date. I’m experimenting to see the difference between using a filter (A):
results |> @filter(_.date >= Dates.Date(2018,1,1) && _.date <= Dates.Date(2019,1,1))
vs direct indexing (B):

let idx_first = findfirst(dt -> dt == Dates.Date(2018,1,1), results.date)
       idx_last = findlast(dt -> dt == Dates.Date(2019,1,1), results.date)
       @view results[idx_first:idx_last, :]
end

When running @time I get (I’ve ran @time several times in a row, so no JIT involved):
(A): 0.013382 seconds (5.74 k allocations: 336.941 KiB)
(B): 172.499 μs (12 allocations: 416 bytes)
Clearly (B) wins. But when I run @btime in front of the two code samples I get:
(A): 60.500 μs (121 allocations: 8.31 KiB)
(B): 0.040810 seconds (104.54 k allocations: 6.316 MiB)
and (B) does a lot of allocations, thus (A) wins. Can someone explain this to me?

When I collect both into dataframes at the end
results |> @filter(_.date >= Dates.Date(2018,1,1) && _.date <= Dates.Date(2019,1,1)) |> DataFrame
vs

let idx_first = findfirst(dt -> dt == Dates.Date(2018,1,1), results.date)
       idx_last = findlast(dt -> dt == Dates.Date(2019,1,1), results.date)
       results[idx_first:idx_last, :] |> DataFrame
end

then (B) is the clear winner in both (@time vs @btime) cases.

The result using @time is expected to be very noisy for short duration tasks, solving this problem is the whole purpose of @btime. I would thus not use @time at all and only use @btime for benchmarking tasks that takes less than a second or so.

7 Likes

Yes, and @time may also measure compilation time. But there is a real discrepancy here, because @btime returns the minimum, so it should just win over @time.

1 Like

The @benchmark macro would give you more information such as min/max/mean of the trials, should that be desired.

2 Likes