I’m relatively new to Julia and want to implement a numerical method using the CUDA libraries for Julia. I worked myself through the introduction files on GitHub and gained all the basic knowledge to write my own code so far.
The thing is that I want to create at least a not completely inefficient code. Therefore my aim is to avoid unnecessary overhead communication between the GPU and CPU, or their memories. And here is my, maybe stupid, question.
Imagine I have some data
A = rand(ComplexF64, (N,N)) ,
B = rand(ComplexF64, (N,N)) ,
where N is some fixed integer, and upload my data to the Nvidia GPU using
A_gpu = CuArrays.cu(A),
B_gpu = CuArrays.cu(B).
And the thing I asking myself is, when I’m performing a simple a simple matrix multiplication
does this calculation take place at the GPU? I mean is this a standard implemented feature of Julia when one is multiplying CuArrays, or need I to write an extra kernel function for a “parallel matrix multiplication” and call it with @cuda…?
I would be great if some expert on Julia can answer this question for me.