Benchmarking with @time @btime and subsequent runs return shorter execution time

neo_abs · December 29, 2020, 5:47pm

I am using @time and @btime macros to determine how my function preforms on different number of threads. I am mostly interested not in specific time but relative values. Threads aside, I noticed that the more times I run tests the faster the function gets. Below with @time:

julia> @time modeltest(mdl_init);
 20.720889 seconds (214.16 M allocations: 7.629 GiB, 31.47% gc time)
julia> @time modeltest(mdl_init);
  2.384151 seconds (174.74 M allocations: 5.730 GiB, 55.80% gc time)
julia> @time modeltest(mdl_init);
  2.171960 seconds (174.74 M allocations: 5.725 GiB, 65.92% gc time)

Similar outcome with @btime:

julia> @btime modeltest(mdl_init);
  6.387 s (174732441 allocations: 5.72 GiB)
julia> @btime modeltest(mdl_init);
  657.132 ms (174730885 allocations: 5.72 GiB)
julia> @btime modeltest(mdl_init);
  632.269 ms (174731867 allocations: 5.72 GiB)

Differences in execution times in @time and @btime aside (), where does differences between consecutive runs come from? Is Julia Compiler learning how to run function more efficiently? Or is it re-using RAM garbage?

Julia Version 1.5.3

lmiq · December 29, 2020, 5:56pm

The difference between the first @time and the others is due to the fact that in the first run of the function it gets compiled. Thus, in the first run you are measuring the compilation time and its allocations.

The differences between subsequent @time executions (after the first one) are probably random noise.

The benchmarks with @btime are probably wrong, because you need to interpolate the variables there, with $:

@btime modeltest($mdl_init)

be careful also if the function modeltest modifies the content of mdl_init, because @btime executes the function multiple times, thus times may vary because the input is different. This may be also a reason for such a disparity in the first and subsequent calls of @btime.

By the way: That amount of allocations and that amount of garbage collection probably indicate that there is something wrong (type instabilities) in your code.

neo_abs · December 29, 2020, 9:33pm

Additional question:

See table:

Number of threads	Relative speed	Share of time used by Garbage Collector
1	19.89	4.02%
4	8.22	32.55%
8	6.76	46.96%
16	4.88	56.88%
32	1.31	69.62%
64	1.0	59.17%

Just based on a table above, without looking into code, can you tell me: does my code has problem with GC or is it @Threads related? Or a bit of both?

If relative speed and share of GC are multiplied the results are moreless consistent across nthreads.

lmiq · December 29, 2020, 9:49pm

In my not so long experience, I would say that 4% of GC is not necessarily an indication of a problem, but it may be. I had a similar situation and in my case I finally found where those allocations where occuring and fixed them, making the threaded version much better. Ideally one would like a code that does not allocate anything in the performance-critical parts.

I would try to track those allocations and be sure that they are strictly necessary.

Take a look at this thread: Track memory usage

neo_abs · December 30, 2020, 1:15am

So I should expect to see increase GC given more threads?

lmiq · December 30, 2020, 1:20am

That probably indicates that many allocations occur in a part of the code that is being split through the threads.

pixel27 · December 30, 2020, 7:31pm

I think the question here is, when you have 1 thread, is it doing the same number of calculations as when you have 64 threads? Or is the 64 thread version doing 64 times the number of calculations as when you have 1 thread?

To test apples to apples you would need to ensure that in the 64 thread version each thread is doing 1/64th the number of calculations as the thread in the single thread version. Otherwise you are looking at the GC time between two totally different calculations.

lmiq · December 31, 2020, 10:11am

I had that kind of GC increase when I had a type instability in the container of the results of the calculation, which was copied for each thread to avoid racing conditions. It smells something like that there.

kristoffer.carlsson · December 31, 2020, 10:33am

Yes. Reducing allocations is quite important to speed up multi threaded code in my experience.

Topic		Replies	Views
@threads vs @parallel, a simple fail case for @threads Performance	3	1423	October 31, 2017
@time vs @btime Performance	5	17896	March 22, 2018
How much is it normal that @time differs in time? New to Julia	20	2441	June 28, 2017
What is the correct procedure to compare the execution time between sequential code and parallel code? General Usage question	15	2007	September 17, 2021
@time and @btime show different allocations General Usage benchmarktools , time	2	695	April 5, 2021

Benchmarking with @time @btime and subsequent runs return shorter execution time

Related topics