Hi,
I am looking for best practices to load the CUDA.jl
package and the first usage of methods such as CUDA.randn
from this package.
I am running CUDA.jl on a cluster node. I am not sure if I am using the package correctly.
When I run the following code, I see a long recompilation time. However the package was precompiled when it was installed.
@time using CUDA
# 3.247396 seconds (9.32 M allocations: 628.931 MiB, 3.77% gc time, 14.07% compilation time: 59% of which was recompilation)
for i in 1:5
@time x=CUDA.randn(1000,10);
end
# 14.033213 seconds (26.59 M allocations: 1.350 GiB, 3.78% gc time, 38.45% compilation time)
# 0.000127 seconds (57 allocations: 2.578 KiB)
# 0.000069 seconds (57 allocations: 2.578 KiB)
# 0.000027 seconds (57 allocations: 2.578 KiB)
# 0.000021 seconds (57 allocations: 2.578 KiB)