Simple CuArray conversion, reverse, and transpose taking too long?

aaron_pung · August 25, 2022, 3:24pm

I am ingesting a NetCDF file using the NCDatasets package; the resulting array of type Matrix{Float32} has size (32768,1000). Afterwards, I’m converting the array to a CuArray:

```using CUDA
using NCDatasets

CUDA.allowscalar(false)

# Read in data
t0 = Dates.now()
ds = Dataset(data_path)["power"][1,:,:]
t1 = Dates.now()

# Convert to CUDA array
cu_ds = CuArray(ds)
t2 = Dates.now()

# Transpose CUDA array
cu_ts = transpose(cu_ds)
t3 = Dates.now()

# Reverse CUDA array
cu_rs = reverse!(cu_ts,dims=1)
t4 = Dates.now()```

Times for each of these steps is as follows:

```[ Info: Ingest:     1235 milliseconds
[ Info: CUDA array: 2695 milliseconds
[ Info: Transpose:  2718 milliseconds
[ Info: Reverse!:   16385 milliseconds```

I know I’m doing something wrong, but I cannot figure out what it is. I’ve checked file types to ensure all arrays are CuArrays. I’ve also looked through the CUDA GitHub script to make sure I’m calling the functions correctly.

maleadt · August 25, 2022, 7:48pm

In Julia, the first execution of everything takes longer. Here, the GPU compiler is being compiled for the reverse! kernel. Call it a second time, it’ll be instantaneous.

aaron_pung · August 25, 2022, 7:58pm

Thanks! I’m aware that the compiler takes a bit longer the first iteration, but I only need to call this function once at the beginning of my script.

Is there any way I can initialize reverse! by calling it on a very small or empty array, pay this “time tax” early on, and have the reverse! function work instantaneously when I actually need to use it? Or will it have a large time penalty regardless?

maleadt · August 29, 2022, 7:57am

That would work, as long as the types of objects that will be passed to the kernel are identical. It’s not yet possible to precompile those invocations though, due to a number of missing features and bugs in Julia, so for now it needs to happen at run time.

Topic		Replies	Views
Adjoint/transpose wrapper question GPU	18	545	April 21, 2024
A reverse! for CuArrays and Nd arrays Performance cuda	6	817	July 8, 2019
The performance difference of transferring (SubArray, ReshapedArray) Array to GPU GPU flux	2	644	November 20, 2019
What is the optimal way of updating CuArray? GPU cudanative	7	1504	July 5, 2018
How to precompile CUDA kernel itself? GPU cuda	8	274	November 6, 2024

Simple CuArray conversion, reverse, and transpose taking too long?

Related topics