I am trying to compare between two codes; one with parallel computing (6 threads) and the other with single thread (normal).
I put each code between “@time … end”, i.e.,:
@time begin
...
sequential code
...
end #@time begin
I get 2.464155 seconds
julia> Threads.nthreads()
6
@time begin
...
parallel code (with @threads)
...
end #@time begin
I get 5.464155 seconds.
As I understood that the computation time in the later is much more because @time includes the compilation time for each thread. So, I tried to use @btime instead, I get an error as below:
@btime begin
...
sequential code (or parallel code (with @threads))
...
end #@btime begin
ERROR: LoadError: BoundsError: attempt to access 96003×2 Matrix{Float64} at index [96004:96006, 1:2]
Stacktrace:
[1] throw_boundserror(A::Matrix{Float64}, I::Tuple{UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}})
@ Base .\abstractarray.jl:651
[2] checkbounds
@ .\abstractarray.jl:616 [inlined]
[3] _setindex!
@ .\multidimensional.jl:886 [inlined]
[4] setindex!(::Matrix{Float64}, ::Matrix{Float64}, ::UnitRange{Int64}, ::Function)
@ Base .\abstractarray.jl:1267
[5] macro expansion
First of all, you are correct that @time measures compile time. if you are using threads, all processes share memory and as such everything is only compiled once so this will likely not be the reason for your overhead. Instead of @btime, you can also just run the code twice to get rid of compilation time to your result.
The error message tells you, that you are indexing a matrix at larger indices than its size. Are you sure that the error is gone if you simply replace @btime with @time?
For debugging and performance (including benchmarks), it is advisable to wrap your code into functions instead of writing everything in main scope which I assume you might be doing here. Maybe doing this already helps you find the problem, otherwise you will probably have to paste more of the code here, preferably a minimum (not) working example
Hope this helps
Yes, the error gone with @time. I tried already to wrap the code into function but I encountered a world age problem, so I was forced to include @eval functions and use global variables to get ride of this problem. But, this caused the execution’s time to be larger by far.
Runing the code twice is giving me this warning (the below time is with @time not @btime):
WARNING: replacing module ModuleClasses.
WARNING: using ModuleClasses.Simulation in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.curMatRLC in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.adjMatVsince in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.RLCs in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.Vsines in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.volMatVsine in module Main conflicts with an existing identifier.
WARNING: using ModuleClasses.admitanceMatRLC in module Main conflicts with an existing identifier.
0.888512 seconds (9.37 M allocations: 861.883 MiB, 8.47% gc time, 0.27% compilation time)
So, I have to quite the current session of REPL and run it again.
Ah maybe that gets us closer to the issue.
Do you bring modules into scope with
using .Simulations
(note the dot) or similar? There is likely no reason to redefine the module in the code you are benchmarking and it can cause errors because the compiler doesn’t know that the modules with a sharing name are completely identical. (probably that’s also the cause of what goes wrong when using @btime instead). The above statement should only occur once at the start of the program.
All the other performance critical part can be wrapped into a function.
No, I am not bringing the modules by using .Simulations
I have several structures objects and I put them in a user-defined module called “ModuleClasses”, in which I bring it at the beginning of the code by using .ModuleClasses
So the using statement is only at the top level and not within the instructions that you have within your @time block? That would be curious.
Somehow, the definitions for your modules / structs get overwritten which prompts the warnings and likely causes errors afterwards.
If you are certain you aren’t redefining the module or bringing it into scope twice, then you should probably post your code here, as the former is the only possible reason I can currently think of.
OK the module definition itself looks like it should work, I think the only way to identify the source of the problem is by reproducing it as simple as possible.
You have figured out that there are errors when you copy the @time block to make it execute a second time, right?
I would build on that:
Maybe we now copy only the parallel code inside the @time block. If no black magic happens, I’d exepect the warnings and the errors to remain.
@time begin
...
parallel code
parallel code
...
end #@time begin
If the error remains, there is likely one function which brings your module into scope again. This can probably be found by inserting the statements that make up the (second) parallel code only one by one, if none are there, no error should appear so it must occur at some point if you add more and more lines to it. That way you should probably find the responsible function.
Thank you very much for your support. Actually, when I define the iter as a global variable, the below error is gone when using @btime. Any idea why is that?
@btime begin
global iter = 0;
for t in tmin:dt:tmax
global iter += 1;
Matrix[(iter]...
ERROR: LoadError: BoundsError: attempt to access 96003×2 Matrix{Float64} at index [96004:96006, 1:2]
Stacktrace:
By the way, I wrap all my code into a function called ss (with no @threads), same version with @threads is called ssP. To compare between them, I have followed the below two methods and the resulting times are shown:
First method:
ss()
@time ss() #Results in 0.224237 s
ssP()
@time ssP() #Results in 0.251157 s
Second method:
@btime ss() #Results in 214.202 ms
@btime ssP() #Results in 183 ms in (parallel)
Why there is a difference between the two methods and which one is correct?
1- So, in the first method, when calling @time then the function, it only shows its execution time, since its complication time is removed by its first call, right?
2- @btime shows the the minimum of execution times, right?
3- Which method do you recommend to use in the comparison between serial and parallel codes?
4- Since @time measure the compilation time + execution time for each @threads. Is there any way to measure the total time of “compilation time (once) +execution time for parallel code”? In other words, I need to show the beauty of using parallelization in julia by comparing the time to run (pressing run in the vs code program) a code (with/without @threads)?
@btime is much more reliable than @time for virtually any benchmarking.
Usually, measuring compilation time is not very interesting or meaningful (unless you are working on optimizing the compiler itself). As you increase the problem size, the compilation time quickly becomes negligible (since it remains fixed), and for interactive usage most people leave Julia running so the compilation only happens once.
I am trying to compare between the solution of an electrical circuit by my code in julia (has @threads) with another commercial program, which gives also the Elapsed clock time (which includes the compilation time), such as 2.5 second. So, I am trying to find similar issues in Julia.
I’m not sure what you mean by a “commercial program” that “includes the compilation time” — most commercial software comes already compiled?
My main point is that people normally benchmark on relatively small test problems (which run in a few seconds), but you mostly care about performance for large problems (or when the small problems are run many times in a loop), which take minutes or hours or even days. In order to extrapolate from small problems to large problems, you should not include compilation time, because compilation time is a one-time cost that does not increase with the problem size.