Hi,

I’m relatively new to Julia and want to implement a numerical method using the CUDA libraries for Julia. I worked myself through the introduction files on GitHub and gained all the basic knowledge to write my own code so far.

The thing is that I want to create at least a not completely inefficient code. Therefore my aim is to avoid unnecessary overhead communication between the GPU and CPU, or their memories. And here is my, maybe stupid, question.

Imagine I have some data

A = rand(ComplexF64, (N,N)) ,

B = rand(ComplexF64, (N,N)) ,

where N is some fixed integer, and upload my data to the Nvidia GPU using

A_gpu = CuArrays.cu(A),

B_gpu = CuArrays.cu(B).

And the thing I asking myself is, when I’m performing a simple a simple matrix multiplication

A_gpu*B_gpu

does this calculation take place at the GPU? I mean is this a standard implemented feature of Julia when one is multiplying CuArrays, or need I to write an extra kernel function for a “parallel matrix multiplication” and call it with @cuda…?

I would be great if some expert on Julia can answer this question for me.

Thanks