CuArrays using @views+mul!+transpose+slicing

maleadt · April 30, 2021, 11:03am

With that PR, you’ll end up executing the generic GEMM method from GPUArrays, which is OK but slow. The second operation, which you mentioned ‘works fine’ actually doesn’t and triggers scalar iteration, which is extremely slow and should be avoided. Optionally, if you want a really fast GPU execution, you need to make sure your arrays are recognized as strided GPU arrays so that we can dispatch to the CUBLAS library. That involves making sure the memory is contiguous, and that you’re not using too many array wrappers (because of how Julia’s array hierarchy is currently designed, it’s hard to recognize GPU arrays when they are wrapped a bunch).

Topic		Replies	Views
Using @view with CuArrays GPU	6	1138	September 20, 2023
Performance of view with cuArrays GPU	11	2670	November 11, 2018
Dot-product of CuArray views is slow GPU performance , memory-allocation , views	10	1537	May 11, 2021
Julia Cuda Matrix multiplication General Usage cudanative , cuda	3	4223	February 24, 2021
ArgumentError: cannot take the CPU address of a CuArray when using selectdim GPU question	4	1915	April 10, 2020

CuArrays using @views+mul!+transpose+slicing

Related topics