Tutorial: How to benchmark and profile your code?

Wikunia · March 23, 2021, 3:20pm

Interested in benchmarking and profiling your code?
My new blog post walks you through it from highlevel benchmarking to getting deeper with profiling tools.
It’s quite high level so I avoided explaining the different lower level macros there but will do this in another post if interested:

Let me know your thoughts and enjoy reading

sijo · March 23, 2021, 4:07pm

Very nice! A few remarks:

You might want to rename this topic: it sounds like you’re asking for help profiling your code… maybe “A new tutorial on benchmarking and profiling” or similar?
Regarding the first flamegraph screenshots and this part:

My screenshot does not show the full width but when you run it you can see that the filter! function takes more than 98% of the time. Which means it is the function we want to optimize.

It’s a bit confusing to have a screenshot that doesn’t illustrate the point (on the screenshot it looks like filter! spends almost all its time in != and <=). Maybe use a screenshot showing the full width of filter! and if the text is unreadable then, you could add the current screenshot as “magnifier”?
The last runtime plot which “looks quite funny” as you say, can make the reader skeptical that the third solution is really doing what it should do… Maybe a good opportunity to show that a logarithmic scale can be useful?

EDIT: just to clarify, for the first remark I meant the title here on Discourse.

Wikunia · March 23, 2021, 4:16pm

Thanks for your thoughts. Will add those!

Skoffer · March 23, 2021, 4:34pm

In this code

    # convert from nano seconds to seconds
    push!(ys, mean(t).time / 10^9)

Why are you using mean? It’s inconsistent with @btime behaviour which uses min. And it behave worse compared to median which is my second choice in such estimations.

Wikunia · March 23, 2021, 4:45pm

I find min a bit strange but it depends on what you want to measure I guess. Is mean wrong?
I chose mean to have the average running time of the function. One could add error bars around it in the plot.

Skoffer · March 23, 2021, 6:27pm

Well, the consensus is that min is the most adequate metric to measure actual code performance because the time of the code execution is always “time of the code itself + some random nonnegative noise from the operating system”. Since the second term is always nonnegative, when you take min you’ll get the closest estimate to the real time of the code execution. Median is slightly worse, mean is the worst of them all, since it is very skewed. Imagine, that in 10 runs you get 9 measurements with the time 1ms and 1 with the time 10s. Mean time would be 1s, which is definitely not representative of the actual execution time.

But anyway, whether you agree with it or not, it’s inconsistent to use and compare @btime and mean to profile the same code. It should be either one or another.

Wikunia · March 23, 2021, 6:30pm

Thanks for the clarification @Skoffer . Will make the changes accordingly.

lmiq · March 24, 2021, 12:36am

Once I benchmarked a code putting my laptop in the freezer. It was clearly faster. I will test that again and compare the minimum, median and average times obtained relative to room temperature, hopping to show that thermodynamic noise, not only operating system noise, enters into the equation.

Skoffer · March 24, 2021, 4:10am

Surely my statement is simplification. Another reason for “negative operating system time” can be governor management (I hope this term is correct). Operating system can change cpu frequency on demand, so it is possible, that during benchmark frequency can go up and overall execution time decrease. So yes, this formula is simplification.

jzr · March 24, 2021, 5:23am

See: Minimum times tend to mislead when benchmarking

rafael.guerra · March 24, 2021, 6:27am

See this paper conclusion: results suggest that using the minimum estimator for the true run time of a benchmark, rather than the mean or median, is robust to non-ideal statistics and also provides the smallest error.

Tamas_Papp · March 24, 2021, 7:20am

Or, possibly, garbage collection, which occurs in bursts. If some GC is inevitable, it is reasonable to use the median too because it gives you more realistic timing for practical purposes.

These are just informative statistics, there isn’t a single best one. That said, if you have to pick one, then in general minimum is a reasonable choice.

Topic		Replies	Views
Tweaking @btime General Usage question , benchmark , benchmarktools	8	540	October 22, 2021
Speed up in Julia Performance	11	914	September 1, 2020
Benchmark parts of a function? New to Julia question , benchmark	3	513	June 30, 2021
Meaning of 2 minimum times of `@btime` General Usage benchmarktools	13	517	June 20, 2022
How much is it normal that @time differs in time? New to Julia	20	2370	June 28, 2017

Tutorial: How to benchmark and profile your code?

Related topics