Tic() + toc/toq() VS @elapsed

A nice alternative is also GitHub - KristofferC/TimerOutputs.jl: Formatted output of timed sections in Julia, to time sections of a program, with nicely formated output about timings and more.

1 Like

It seems the resolution I get from this is lower than tic() and toq() Julia used to have.
Many operations I could measure with tic() and toq() are now show 0 run time.

Any idea?

Update

In documentation of time() it is written its Micro Seconds. Hence one should use higher accuracy timing as in time_ns().

You can use time_ns instead.

4 Likes

I think that as soon one tries to measure something on the computer that only takes a few microseconds, the result will be extremely noisy. Going down to nanoseconds will just magnify the problems.

5 Likes

Regarding time_ns, a potentially easier to understand variant is rdtsc() = ccall("llvm.x86.rdtsc",llvmcall, Int, (), ). Once you are below the microsecond range, you probably want to count CPU cycles instead. Also, that way you at least know where to ask/read for details (e.g. your CPU manual instead of having to figure out what time_ns actually does).

Direct measurement of small times is nontrivial. E.g. unqualified sentences like “this function takes 20ns to run” are meaningless: throughput? latency? In what context? Superscalar CPU don’t work by advancing from one instruction to the next. They can, at great cost, manufacture an illusion (“architected state”) of having sequentially gone through the steps described in your assembly code.

3 Likes

Thank you for your answer.

I wanted to measure run time of a function.
I ended up using @elapsed with running the same function for few times and taking the median.

Do you find it reasonable?

Sure that is reasonable if the @elapsed is large enough to not care about millisecond overheads and includes whatever amount of garbage collection you need.

For faster functions, the @belapsed / @benchmark / @btime macros are handy. But these have all the other problems with running in loops.

The advantage of rdtsc over e.g. time_ns is only that it permits smaller loops / faster functions, because it has lower overhead and jitter, and cycles are imo easier to reason about than times. Benchmarking is still hard.

Yet if it counts cycles and the CPU’s changes its clock what’s the point? It will be accurate only for very small time frames where the CPU doesn’t change or in the steady state if we can guarantee the cooling solution is good enough.