Julia Cuda Matrix multiplication

Noobie76 · March 11, 2019, 1:57pm

Hi,

I’m relatively new to Julia and want to implement a numerical method using the CUDA libraries for Julia. I worked myself through the introduction files on GitHub and gained all the basic knowledge to write my own code so far.

The thing is that I want to create at least a not completely inefficient code. Therefore my aim is to avoid unnecessary overhead communication between the GPU and CPU, or their memories. And here is my, maybe stupid, question.

Imagine I have some data

A = rand(ComplexF64, (N,N)) ,
B = rand(ComplexF64, (N,N)) ,

where N is some fixed integer, and upload my data to the Nvidia GPU using

A_gpu = CuArrays.cu(A),
B_gpu = CuArrays.cu(B).

And the thing I asking myself is, when I’m performing a simple a simple matrix multiplication

A_gpu*B_gpu

does this calculation take place at the GPU? I mean is this a standard implemented feature of Julia when one is multiplying CuArrays, or need I to write an extra kernel function for a “parallel matrix multiplication” and call it with @cuda…?

I would be great if some expert on Julia can answer this question for me.

Thanks

kristoffer.carlsson · March 11, 2019, 2:05pm

It will indeed take place on the GPU. This is not really a “standard implemented feature of Julia” it is just that * can be overloaded and the guys writing CuArrays overloaded * between two CuArrays (CuMatrices specifically) to call the CUBLAS version of matrix multiply.

maleadt · March 11, 2019, 3:19pm

Specifically, this is where the CUBLAS-implementations are dispatched to: https://github.com/JuliaGPU/CuArrays.jl/blob/cee6253edeca2029d8d0522a46e2cdbb638e0a50/src/blas/highlevel.jl#L90-L145

And this is the fallback generically-typed implementation (e.g. for use with Dual numbers or other types that are not supported by CUBLAS): https://github.com/JuliaGPU/CuArrays.jl/blob/cee6253edeca2029d8d0522a46e2cdbb638e0a50/src/matmul.jl#L4-L50

JosePereiraUA · February 24, 2021, 6:52pm

This helps. Can I ask if this is the absolute most efficient way of multiplying two matrices, or is there any “trick” one might employ to speed up calculation even more?

Topic		Replies	Views
Thousands of matrix multiplications using CuArray GPU	5	1254	July 11, 2019
CUDA matmul performance GPU question , performance	11	1548	August 21, 2020
CuArrays using @views+mul!+transpose+slicing GPU	4	570	April 30, 2021
Matrix multiplication with CPU and CUDA GPU question	2	758	February 1, 2021
GPU-Kernel function for fast matrix multiplication using shared memory GPU kernel	1	1766	August 13, 2021

Julia Cuda Matrix multiplication

Related topics