Why is all profiling wrapped around an evaluation?

As far as I can tell, all the basic performance measurement tools operate by giving a macro an expression to evaluate. This seems odd to me, and I think I’m missing an understanding of why this is so. This question is a request for help filling that part in.

I want to measure the performance of some parts of my code and not others, for example profiling the numerical computation but not profiling writing the results to disk. So it seemed natural to have code that started timing and then stopped it. But that would require getting an initial state and then a terminal one.

By “profiling” I do not mean line-by-line performance info, just the total such as reported by @time.

As far as I can tell, the documented interface doesn’t explain how to get the components (wall time, cpu time–I assume these are available somewhere, though maybe not with great precision–allocation counts and bytes, possibly gc time and compile time), and all the profiling stuff is oriented around things like @time which must wrap the complete expression and then reports the results.

tic() and toc() once did something like what I’m thinking of, though maybe only for time, not memory. They were deprecated for reasons that are not obvious to me in the issue history. Some of the reasons given there and elsewhere seem to be that they could be misused, and some of the reasons (that they used global state) weren’t even true.

At least 2 packages seem to provide similar start/stop functionality, Timers and TickTock, and TimerOutputs may also provide tools to time chunks. I don’t know if any of them track anything other than time, however.

So there may be alternative that do what I want, and even if not, I can probably structure my code so that the things of interest are contained in a function call, so that @time and friends will work. So my question isn’t “How do I do what I want?”. It’s more why is it so hard? Why is all the performance gathering information wrapped up in macros that require a single expression, and why isn’t a way to get the individual components directly (not use @timed and pull out the results) discussed?

2 Likes
  • TimerOutputs.jl reports time and memory information.
  • Note that you can also apply many macros to begin ... end blocks in your code. For example, @elapsed begin ... end or @allocated begin ... end. (@timeit ... also works.) Doesn’t have to be a single function call.

What does require a single function call is the @code_* family of macros. But this makes a lot of sense I guess because functions/methods are the fundamental blocks for compilation.

5 Likes

I’m also sightly confused about the precise point you’re trying to make. Is it solely the wrapping - not necessary in a function (call) but any block - the central issue for you?

I fail to see how a start stop paradigm is easier than @elapsed begin ... end. Also, it also only reports time, no?

3 Likes

First, thanks for the tip about begin ... end, which as a practical matters takes care of non-functions, at least if using @timed or something that returns the info rather than just printing it out.

The need to evaluate a single expression was one concern, but not the only one. I am also wondering about how the information is exposed. @time reports lots of different information: time, memory use, allocation counts, compile time … But there mostly doesn’t seem a way to get at that information without going through the @time macros. I wonder if there’s a reason for that, perhaps that the lower level interfaces are subject to change or operating system dependent. Or perhaps I haven’t found the right part of the manual.

Start-stop and begin ... end are pretty similar, although the former could in principle be used across different syntactic scopes (also one issue noted in some of the discussion of tic-toc was that it could be applied to outer level constructs, like using). I’m not sure what the antecedent of “it” is in “it also only reports time”, if it’s @elapsed, sure, but you introduced that. I think many of the tic-toc like things may be time only, and that is a significant limitation. But if one wanted them to capture more, one would have to know how to get that additional information, which brings me back to my original question.

If you do macroexpand(Main, :(@time foo())), you’ll see that it is calling undocumented internal functions like Base.gc_num().

(The actual timing is by calling time_ns() at the start and end and taking the difference; you can easily call this yourself if you want. See also the BenchmarkTools.jl package for more rigorous benchmarking macros that call the code multiple times and collect statistics.)

2 Likes