Iv deployed juliaup for Nvidia Orin, when I use different version of julia and test the CUDA, the simple computing of matrix multiplication is out of GPU memory. Is someone using CUDA on Orin?
Make sure you use the local CUDA toolkit, i.e., not the one we ship (by calling CUDA.set_runtime_version!("local")
. Latest version of CUDA.jl should warn about that.