What is the correct procedure to compare the execution time between sequential code and parallel code?

I am trying to compare between two codes; one with parallel computing (6 threads) and the other with single thread (normal).
I put each code between “@time … end”, i.e.,:

@time begin
...
sequential code
...
end #@time begin

I get 2.464155 seconds

julia> Threads.nthreads()
6
@time begin
...
parallel code (with @threads)
...
end #@time begin

I get 5.464155 seconds.
As I understood that the computation time in the later is much more because @time includes the compilation time for each thread. So, I tried to use @btime instead, I get an error as below:

@btime begin
...
sequential code (or parallel code (with @threads))
...
end #@btime begin
ERROR: LoadError: BoundsError: attempt to access 96003×2 Matrix{Float64} at index [96004:96006, 1:2]
Stacktrace:
  [1] throw_boundserror(A::Matrix{Float64}, I::Tuple{UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}})
    @ Base .\abstractarray.jl:651
  [2] checkbounds
    @ .\abstractarray.jl:616 [inlined]
  [3] _setindex!
    @ .\multidimensional.jl:886 [inlined]
  [4] setindex!(::Matrix{Float64}, ::Matrix{Float64}, ::UnitRange{Int64}, ::Function)
    @ Base .\abstractarray.jl:1267
  [5] macro expansion

Any suggestion to handle this issue?

First of all, you are correct that @time measures compile time. if you are using threads, all processes share memory and as such everything is only compiled once so this will likely not be the reason for your overhead. Instead of @btime, you can also just run the code twice to get rid of compilation time to your result.

The error message tells you, that you are indexing a matrix at larger indices than its size. Are you sure that the error is gone if you simply replace @btime with @time?
For debugging and performance (including benchmarks), it is advisable to wrap your code into functions instead of writing everything in main scope which I assume you might be doing here. Maybe doing this already helps you find the problem, otherwise you will probably have to paste more of the code here, preferably a minimum (not) working example :slight_smile:
Hope this helps

1 Like

Yes, the error gone with @time. I tried already to wrap the code into function but I encountered a world age problem, so I was forced to include @eval functions and use global variables to get ride of this problem. But, this caused the execution’s time to be larger by far.

Runing the code twice is giving me this warning (the below time is with @time not @btime):

WARNING: replacing module ModuleClasses.
WARNING: using ModuleClasses.Simulation in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.curMatRLC in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.adjMatVsince in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.RLCs in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.Vsines in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.volMatVsine in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.admitanceMatRLC in module Main conflicts with an existing identifier.
  0.888512 seconds (9.37 M allocations: 861.883 MiB, 8.47% gc time, 0.27% compilation time)

So, I have to quite the current session of REPL and run it again.

1 Like

Ah maybe that gets us closer to the issue.
Do you bring modules into scope with

using .Simulations

(note the dot) or similar? There is likely no reason to redefine the module in the code you are benchmarking and it can cause errors because the compiler doesn’t know that the modules with a sharing name are completely identical. (probably that’s also the cause of what goes wrong when using @btime instead). The above statement should only occur once at the start of the program.
All the other performance critical part can be wrapped into a function.

2 Likes

No, I am not bringing the modules by
using .Simulations
I have several structures objects and I put them in a user-defined module called “ModuleClasses”, in which I bring it at the beginning of the code by
using .ModuleClasses

So the using statement is only at the top level and not within the instructions that you have within your @time block? That would be curious.
Somehow, the definitions for your modules / structs get overwritten which prompts the warnings and likely causes errors afterwards.
If you are certain you aren’t redefining the module or bringing it into scope twice, then you should probably post your code here, as the former is the only possible reason I can currently think of.

I have three structures, with functions, named as “Simulation, RLC, ACSource” which I wrap them in module called “ModuleClasses”, and all in folder named by “Classes” as seen below:

module ModuleClasses #This module includes all element classes
using Parameters, Base; 
include("./../Classes/Simulation.jl");
include("./../Classes/RLC.jl");
include("./../Classes/ACSource.jl"); 
end #module Netlist

1

In my main code (which is located in folder called “3Phase”) I import the defined module and call the functions of structures within the ````@time begin … end``

include("./../3Phase/Classes/ModuleClasses.jl");
using .ModuleClasses;
@time begin
...
parallel code (which call the functions inside the structures)
...
end #@time begin

OK the module definition itself looks like it should work, I think the only way to identify the source of the problem is by reproducing it as simple as possible.
You have figured out that there are errors when you copy the @time block to make it execute a second time, right?
I would build on that:
Maybe we now copy only the parallel code inside the @time block. If no black magic happens, I’d exepect the warnings and the errors to remain.

@time begin
...
parallel code
parallel code
...
end #@time begin

If the error remains, there is likely one function which brings your module into scope again. This can probably be found by inserting the statements that make up the (second) parallel code only one by one, if none are there, no error should appear so it must occur at some point if you add more and more lines to it. That way you should probably find the responsible function.

Thank you very much for your support. Actually, when I define the iter as a global variable, the below error is gone when using @btime. Any idea why is that?

@btime begin 
  global iter = 0; 
  for t in tmin:dt:tmax
    global iter += 1; 
Matrix[(iter]...
ERROR: LoadError: BoundsError: attempt to access 96003×2 Matrix{Float64} at index [96004:96006, 1:2]
Stacktrace:

By the way, I wrap all my code into a function called ss (with no @threads), same version with @threads is called ssP. To compare between them, I have followed the below two methods and the resulting times are shown:

First method:

ss()           
@time ss()      #Results in 0.224237 s

ssP() 
@time ssP()      #Results in 0.251157 s

Second method:

@btime ss()    #Results in 214.202 ms

@btime ssP()    #Results in 183 ms in (parallel)

Why there is a difference between the two methods and which one is correct?

@btime runs them several times and prints the minimum time. @time runs it just once.
Both numbers are “correct”, but different.

1- So, in the first method, when calling @time then the function, it only shows its execution time, since its complication time is removed by its first call, right?
2- @btime shows the the minimum of execution times, right?
3- Which method do you recommend to use in the comparison between serial and parallel codes?
4- Since @time measure the compilation time + execution time for each @threads. Is there any way to measure the total time of “compilation time (once) +execution time for parallel code”? In other words, I need to show the beauty of using parallelization in julia by comparing the time to run (pressing run in the vs code program) a code (with/without @threads)?

@btime is much more reliable than @time for virtually any benchmarking.

Usually, measuring compilation time is not very interesting or meaningful (unless you are working on optimizing the compiler itself). As you increase the problem size, the compilation time quickly becomes negligible (since it remains fixed), and for interactive usage most people leave Julia running so the compilation only happens once.

I do think that in many cases the minimum time isn’t representative of expected performance when dealing with multithreaded code. An extreme example:

julia> @benchmark fetch(Threads.@spawn 1+1)
BenchmarkTools.Trial: 7468 samples with 10 evaluations.
 Range (min … max):  10.931 μs … 360.687 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     66.520 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   66.620 μs ±   5.501 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                        ▅█▆▁
  ▂▁▂▂▁▁▂▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▂▁▂▁▂▁▁▂▂▁▂▂▁▁▂▁▁▁▁▂▂▂▄▇████▆▃ ▃
  10.9 μs         Histogram: frequency by time         70.8 μs <

 Memory estimate: 480 bytes, allocs estimate: 5.

The distribution is left skewed, instead of the typical right.

When dealing with code that allocates, and especially with code using multiple threads, I think the median or even the mean are better.

1 Like

I am trying to compare between the solution of an electrical circuit by my code in julia (has @threads) with another commercial program, which gives also the Elapsed clock time (which includes the compilation time), such as 2.5 second. So, I am trying to find similar issues in Julia.

I’m not sure what you mean by a “commercial program” that “includes the compilation time” — most commercial software comes already compiled?

My main point is that people normally benchmark on relatively small test problems (which run in a few seconds), but you mostly care about performance for large problems (or when the small problems are run many times in a loop), which take minutes or hours or even days. In order to extrapolate from small problems to large problems, you should not include compilation time, because compilation time is a one-time cost that does not increase with the problem size.

2 Likes

Yes, I meant “commercial software”, in which I can build a specific electrical circuit, that needs to be complied later and solved and gives an “Elapsed clock time” which I believe it refers to compilation and run times. As in the below example:


Why the compilation time does not increase with the problem size, as a bigger problem (such as more variables) needs more time to check and compiled?