Hi Fredrik @baggepinnen , Fikri @fksayaci
I really appreciate your excellent Julia code packages,
so Hopefully to save cycles for you , I will speculatively
hazard a low cost SWAG (Scientific/Speculative Wild A*s Guess) to gather Debug Info
to narrow down the root cause of the issue.
So to get more help first you probably need a few more debug steps to narrow down the root cause, so maybe try out @pdeffebac suggestion above ?
Namely " … try and profile the time with just the CSV writing,
and the time with both the CSV writing and the Plotting."
But speculating with what’s given - namely that :
)) reporting_results has both A) write CSV and B) using Plots steps
)) Allocations stays the same @@ 22.6 GiB (GB ? Question know if that is CPU RAM ONLY ?)
)) Per reporting_results
Run#1 including “B) using Plots” taking 3854 seconds, then
Run#2 including “B) using Plots” taking 17882s ;
In order To narrow down root cause consider trying out this advice for
GPU Detecting-memory-leaks / and Track GPU GC (garbage collection)
@@ https://juliagpu.gitlab.io/CUDA.jl/usage/memory/#Detecting-leaks
NOTE … To keep track … feature is only available when running Julia on debug level 2 or higher (i.e., with the -g2 argument).
When you do so, the memory_status() function from above will display additional information:
.
@@ Cuda kernel error - #2 by maleadt
To see the actual error, and stack trace, you need to follow the suggestion:
start Julia on debug level 2, e.g., julia -g2 my_script.jl or just julia -g2 to get a REPL. The reason is that these messages and traces are embedded in the generated code (we don’t have stack unwinding), and thus have a fairly large cost in performance.
For example :
… julia-1.4.2/bin/julia -g2
julia> using CUDA
[ Info: Precompiling CUDA [052768ef-…-66c8b64840ba]
julia> CUDA.memory_status()
Downloading artifact: CUDA90
Downloading artifact: CUDNN_CUDA90
Effective GPU memory usage: 98.50% (3.882 GiB/3.941 GiB) <<
Naturally this is just a speculative SWAG to gather Debug Info
if multiple runs of
“B) using Plots” is the issue,
so of course
“A) write CSV” could still be the issue.
Let us know your test results so far ?
HTH,
Marc