# Tutorial: How to benchmark and profile your code?

Interested in benchmarking and profiling your code?
My new blog post walks you through it from highlevel benchmarking to getting deeper with profiling tools.
It’s quite high level so I avoided explaining the different lower level macros there but will do this in another post if interested:

Let me know your thoughts and enjoy reading

18 Likes

Very nice! A few remarks:

• You might want to rename this topic: it sounds like you’re asking for help profiling your code… maybe “A new tutorial on benchmarking and profiling” or similar?

• Regarding the first flamegraph screenshots and this part:

My screenshot does not show the full width but when you run it you can see that the `filter!` function takes more than 98% of the time. Which means it is the function we want to optimize.

It’s a bit confusing to have a screenshot that doesn’t illustrate the point (on the screenshot it looks like `filter!` spends almost all its time in `!=` and `<=`). Maybe use a screenshot showing the full width of `filter!` and if the text is unreadable then, you could add the current screenshot as “magnifier”?

• The last runtime plot which “looks quite funny” as you say, can make the reader skeptical that the third solution is really doing what it should do… Maybe a good opportunity to show that a logarithmic scale can be useful?

EDIT: just to clarify, for the first remark I meant the title here on Discourse.

4 Likes

Thanks for your thoughts. Will add those!

In this code

``````    # convert from nano seconds to seconds
push!(ys, mean(t).time / 10^9)
``````

Why are you using `mean`? It’s inconsistent with `@btime` behaviour which uses `min`. And it behave worse compared to `median` which is my second choice in such estimations.

I find `min` a bit strange but it depends on what you want to measure I guess. Is `mean` wrong?
I chose `mean` to have the average running time of the function. One could add error bars around it in the plot.

Well, the consensus is that `min` is the most adequate metric to measure actual code performance because the time of the code execution is always “time of the code itself + some random nonnegative noise from the operating system”. Since the second term is always nonnegative, when you take `min` you’ll get the closest estimate to the real time of the code execution. `Median` is slightly worse, `mean` is the worst of them all, since it is very skewed. Imagine, that in 10 runs you get 9 measurements with the time 1ms and 1 with the time 10s. `Mean` time would be 1s, which is definitely not representative of the actual execution time.

But anyway, whether you agree with it or not, it’s inconsistent to use and compare `@btime` and `mean` to profile the same code. It should be either one or another.

7 Likes

Thanks for the clarification @Skoffer . Will make the changes accordingly.

1 Like

Once I benchmarked a code putting my laptop in the freezer. It was clearly faster. I will test that again and compare the minimum, median and average times obtained relative to room temperature, hopping to show that thermodynamic noise, not only operating system noise, enters into the equation.

5 Likes

Surely my statement is simplification. Another reason for “negative operating system time” can be governor management (I hope this term is correct). Operating system can change cpu frequency on demand, so it is possible, that during benchmark frequency can go up and overall execution time decrease. So yes, this formula is simplification.

1 Like
2 Likes

Or, possibly, garbage collection, which occurs in bursts. If some GC is inevitable, it is reasonable to use the median too because it gives you more realistic timing for practical purposes.

These are just informative statistics, there isn’t a single best one. That said, if you have to pick one, then in general minimum is a reasonable choice.

5 Likes