I am looking for best practices to load the
CUDA.jl package and the first usage of methods such as
CUDA.randn from this package.
I am running CUDA.jl on a cluster node. I am not sure if I am using the package correctly.
When I run the following code, I see a long recompilation time. However the package was precompiled when it was installed.
@time using CUDA # 3.247396 seconds (9.32 M allocations: 628.931 MiB, 3.77% gc time, 14.07% compilation time: 59% of which was recompilation) for i in 1:5 @time x=CUDA.randn(1000,10); end # 14.033213 seconds (26.59 M allocations: 1.350 GiB, 3.78% gc time, 38.45% compilation time) # 0.000127 seconds (57 allocations: 2.578 KiB) # 0.000069 seconds (57 allocations: 2.578 KiB) # 0.000027 seconds (57 allocations: 2.578 KiB) # 0.000021 seconds (57 allocations: 2.578 KiB)