Best practices to reduce startup time for CUDA.jl?

parthe · December 25, 2022, 7:24am

Hi,

I am looking for best practices to load the CUDA.jl package and the first usage of methods such as CUDA.randn from this package.

I am running CUDA.jl on a cluster node. I am not sure if I am using the package correctly.
When I run the following code, I see a long recompilation time. However the package was precompiled when it was installed.

@time using CUDA
# 3.247396 seconds (9.32 M allocations: 628.931 MiB, 3.77% gc time, 14.07% compilation time: 59% of which was recompilation)
for i in 1:5
    @time x=CUDA.randn(1000,10);
end
# 14.033213 seconds (26.59 M allocations: 1.350 GiB, 3.78% gc time, 38.45% compilation time)
#  0.000127 seconds (57 allocations: 2.578 KiB)
#  0.000069 seconds (57 allocations: 2.578 KiB)
# 0.000027 seconds (57 allocations: 2.578 KiB)
#  0.000021 seconds (57 allocations: 2.578 KiB)

jmair · December 25, 2022, 9:44am

For improving TTFX issues, people usually recommend creating a sysimage using PackageCompiler.jl (link here). However, I’ve had issues with getting a custom sysimage to work specifically with CUDA.jl, but it’s been a while since I last tried, so it may be worth it to try it out.

Also, which version of Julia are you using?

maleadt · December 25, 2022, 9:54am

CUDA.jl is known to have a large TTFX because it essentially needs to re-compile the Julia compiler (which is specialized on an AbstractInterpreter argument). Recent advances in precompilation support in base Julia will make it possible to precompile this.

Topic		Replies	Views
How to precompile CUDA kernel itself? GPU cuda	8	287	November 6, 2024
Benchmark function that uses CUDA.jl Performance cuda , benchmark , precompilation	2	715	November 7, 2021
Taking TTFX seriously: Can we make common packages faster to load and use Performance ttfp	125	11660	June 20, 2022
Flux precompilation takes way too long (1.6.1) New to Julia	6	621	May 14, 2021
Julia compiler v.s. Julia GPU compiler GPU	4	427	December 19, 2024

Best practices to reduce startup time for CUDA.jl?

Related topics